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ADAPTIVE  POLICIES  FOR  A  SYSTEM  OF  COMPETING  QUEUES  I: 
CONVERGENCE  RESULTS  FOR  THE  LONG-RUN  AVERAGE  COST 


by 

Adam  Shwartz  1  and  Armand  M.  Makowskl  2 


ABSTRACT 


This  paper  considers  a  system  of  discrete-time  queues  competing  for  the  attention  of  a  single 
geometric  server.  The  problem  of  implementing  a  given  Markov  stationary  service  allocation  policy  g 
through  an  adaptive  allocation  policy  a  is  posed  and  convergence  of  the  long-run  average  cost  under 
such  adaptive  policy  a  to  the  long-run  average  cost  under  the  policy  g  is  investigated.  Such  question 
typically  arises  in' the  context  of  Markov  decision  problems  associated  with  this  queueing  system,  say 
when  some  of  the  model  parameters  are  not  available  [l,  20]  ,  or  when  the  optimality  criterion  incor¬ 
porates  constraints  [14,  21,  20]  . 

Conditions  are  given  so  that  the  long-run  average  cost  under  the  policy  a  converges  to  the 
corresponding  cost  under  the  policy  g  ,  provided  a  natural  condition  on  the  relative  asymptotic 
behavior  of  the  policies  g  and  a  holds.  Applications  of  the  results  developed  here  are  discussed  in  a 
companion  paper  [20]  .  However,  the  ideas  of  this  paper  are  of  independent  interest  and  should  prove 
useful  in  studying  implementation  and  adaptive  control  issues  for  broad  classes  of  Markov  decision 
problems  [12]  . 
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1.  INTRODUCTION: 

Consider  the  following  system  of  K+l  infinite-capacity  queues  that  compete  for  the  use 
of  a  single  server:  Time  is  slotted  with  the  service  requirement  of  each  customer  correspond¬ 
ing  exactly  to  one  time  slot.  At  the  beginning  of  each  time  slot,  the  controller  gives  priority 
to  one  of  the  queues  according  to  some  prespecified  dynamic  priority  assignment,  on  the  basis 
of  available  information,  and  the  selected  queue  is  given  service  attention  during  that  slot. 
However,  due  to  a  variety  of  reasons  ranging  from  server  failure  to  exogenous  interferences, 
with  a  positive  probability,  the  service  will  fail;  in  that  case,  the  service  of  that  customer  will 
be  rescheduled  at  a  later  time  in  accordance  with  the  service  allocation  policy.  When  the  ser¬ 
vice  does  not  fail  in  a  given  time  slot,  the  customer  is  declared  serviced  and  leaves  the  system 
at  the  end  of  the  slot.  In  the  present  paper,  the  failures  are  assumed  generated  through 
independent  Bernoulli  processes,  with  possibly  class-dependent  parameters,  and  this  indepen¬ 
dently  of  the  arrival  mechanism.  New  customers  may  arrive  in  batches,  which  are  modelled  as 
an  arbitrary  ( K  -f-l)-dimensional  renewal  process,  to  capture  partial  correlations  between 
arrivals  from  different  classes  in  a  given  slot. 

For  this  system  of  competing  queues,  the  selection  of  a  service  allocation  strategy  with 
good  performance  properties  has  been  discussed  in  a  series  of  recent  papers:  Baras,  Dorsey 
and  Makowski  [2]  discussed  the  model  in  the  case  K  =  1  and  showed  the  optimality  of  the 
pc -rule  when  the  cost  is  linear  in  the  queue  sizes.  This  result  was  further  extended  to  an 
arbitrary  number  of  customer  classes,  under  weaker  statistical  assumption  on  the  arrival 
stream,  in  the  works  of  Baras,  Ma  and  Makowski  [3]  and  of  Buyyukkoc,  Varaiya  and  Wal- 
rand  [4]  . 

In  [14]  ,  Nain  and  Ross  considered  the  situation  where  several  types  of  traffic,  e.  g., 
voice,  video  and  data,  compete  for  the  use  of  a  single  synchronous  communication  channel. 
They  formulate  this  situation  as  a  system  of  K  +1  discrete-time  queues  that  compete  for  the 
attention  of  a  single  server,  and  solve  for  the  service  allocation  strategy  that  minimizes  the 
long-run  average  of  a  linear  expression  in  the  queue  sizes  of  K  customer  classes,  under  the 
constraint  that  the  long-run  average  queue  size  of  the  remaining  customer  class  does  not 
exceed  a  certain  value.  Extending  some  of  the  optimality  results  from  Baras,  Ma  and 
Makowski  [3]  ,  they  show  that  if  the  constraint  can  be  met,  then  the  following  policy  is 
optimal:  There  exists  a  pair  of  static  work-conserving  service  assignment  policies  (of  which 
pc  -rules  are  only  one  description),  say  /  0  and  /  \  with  the  property  that  if  there  are  custo¬ 
mers  in  the  system,  a  biased  coin  is  flipped  with  bias  ?/  ,  and  channel  right  is  implemented 
according  to  the  outcome  via  /  0  and  /  1  with  probability .7/  and  1  -  r/  ,  respectively. 

Typically,  as  these  two  examples  indicate,  analysis  will  identify  a  Markov  stationary 
policy  g  which  exhibits  optimality  properties  with  respect  to  a  long-run  average  cost  cri¬ 
terion.  Unfortunately,  this  Markov  stationary  policy  is  usually  not  readily  implementable. 
despite  its  strong  structural  properties,  with  the  encountered  difficulties  falling  essentially 
into  one  the  two  following  categories: 
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(i) :  The  form  of  the  policy  g  is  a  function  of  the  various  parameters  determining  the 
statistical  description  of  the  model.  The  actual  values  of  these  parameters  are  often  not 
available  to  the  decision-maker  and  need  to  be  estimated  as  part  of  the  system  operation. 
Such  a  situation  was  considered  by  Baras,  Dorsey  and  Makowski  [1]  for  a  long-run  average 
cost  linear  in  queue  sizes,  when  the  failure  rates  where  not  available. 

(ii) :  Even  in  the  event  the  actual  parameter  values  are  available,  the  Markov  stationary 
policy  g  need  still  not  be  implementable  due  to  computational  difficulties  inherent  to  its 
definition.  The  situation  treated  by  Nain  and  Ross  14]  is  a  good  case  in  point,  for  non¬ 
trivial  off-line  computations  are  required  in  order  to  compute  the  actual  value  of  g*  and 
implement  the  seemingly  simple  randomized  policy  discussed  earlier. 

Various  methods  have  been  proposed  in  the  literature  to  overcome  these  implementation 
difficulties.  In  most  cases,  the  solution  amounts  to  generating  an  alternate  policy  a  through 
the  Certainty  Equivalence  Principle  via  a  specific  estimation  scheme  that  exploits  the  specific 
structure  of  the  Markov  stationary  policy  g  .  Such  a  policy  a  will  be  referred  to  as  an  adap¬ 
tive  implementation  of  the  policy  g  ,  thus  broadening  the  technical  meaning  of  the  word 
'“adaptive”  as  understood  in  the  literature  on  the  non-Bayesian  adaptive  control  problem  for 
Markov  chains  [11]  . 

In  this  context,  it  is  then  natural  to  investigate  when  the  performance  measures  of 
interest  coincide  under  these  two  policies.  The  main  result  of  this  paper,  given  in  Theorem 
3.1,  can  be  viewed  as  an  extension  of  a  result  by  Mandl  [13]  to  randomized  strategies  and 
countable  state  spaces.  It  gives  sufficient  conditions  for  the  performance  measures,  when 
taken  in  the  long-run  average  sense,  to  coincide  under  the  two  policies.  Although  this  result 
is  discussed  in  the  context  of  competing  queues  systems,  the  methodology  has  broader  appli¬ 
cability  to  various  issues  in  the  theory  of  Markov  decision  processes.  The  ideas  and  results 
presented  here  are  applied  on  various  problems  in  a  companion  paper  [20]  . 

The  paper  is  organized  as  follows:  The  model  and  basic  assumptions  are  described  in 
Section  2.  In  Section  3,  some  motivation  is  provided  for  the  issues  discussed  in  this  paper, 
and  the  main  convergence  result  for  the  cost  given  as  Theorem  3.1.  Its  proof  underlies  most 
of  the  material  discussed  in  subsequent  sections.  The  structure  of  passage  times  to  the  empty 
state  is  studied  in  Section  4  under  the  action  of  arbitrary  admissible  non-idling  strategies, 
and  the  results  are  exploited  in  Section  5  to  derive  the  statistical  properties  of  the  busy 
cycles.  In  Section  6,  bounds  on  various  moments  of  the  queue  size  process  are  established 
through  a  renewal  argument  that  exploits  the  statistical  properties  of  the  busy  periods;  more¬ 
over,  a  representation  is  obtained  for  the  cost,  in  terms  of  the  invariant  measure  associated 
with  the  policy  g  .  The  proof  of  the  main  result,  Theorem  3.1,  is  then  an  easy  consequence  of 
a  useful  extension  to  a  well-known  result  of  Mandl  [13]  on  the  optimality  of  adaptive  policies. 
This  topic  is  discussed  in  Section  7. 


A  few  words  on  the  notation  used  throughout  the  paper:  The  set  of  all  non-negative' 
integers  is  denoted  by  IN  while  1R  stands  for  the  set  of  all  real  numbers.  An  element  x  in 
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K  4-1 

IR  T  will  sometimes  be  written  as  a  (A +l)-tuple  {x0,x  x  ...,xK ).  with  the  notation 
K 

\  x  \  Yj  xk  ■  As  a  natural  convention,  the  k- th  component  of  any  element  of  IR/!:+1  is 
k  =o 

denoted  by  the  same  symbol  as  this  element,  but  subscripted  by  k ,  0 <k  <K ,  with  a  similar 
convention  for  random  variables.  In  particular,  the  element  in  IR^+1  whose  components  are 
all  zero  is  also  denoted  by  0.  The  standard  basis  for  IR^+1  is  denoted  by  B  —  {ek  }0^,  while 
S  is  the  standard  K-simplex,  i.  e., 

K 

S  :=  {  p  E  1RI<+1  :  2  Pk  =  1  and  0<pk  <1,  0 <k  <K  }. 
k  =0 

The  indicator  function  of  a  set  A  is  denoted  by  I  (A  )  and  the  Kronecker  delta  is  denoted  by 
<5(v)>  with  S(a  ,b  )=1  if  a  =b  and  8(a  ,b  ):=0  otherwise.  For  any  mapping  h  :  IN^+1— *-IR,  it 
is  convenient  to  pose  (  h  |  :=  sup  |  h  {x  )  [  . 


2.  MODEL  AND  ASSUMPTIONS: 

The  basic  random  variables 

In  this  paper,  all  probabilistic  elements  are  defined  on  a  single  sample  space  D  equipped 
with  the  <T-field  of  events  F  .  This  sample  space  carries  the  basic  random  variables  (RV’s)  E, 
{t/(n)}iC‘J,  {A  (n  )}j°°  and  which  take  values  in  IN^4-1,  B  ,  INA+1  and  {o,l}^+l, 

respectively.  It  is  convenient  to  introduce  the  information  RV’s  {H(n)}f°,  which  are  recur¬ 
sively  defined  by  II  (l):=H  and 

H  (n  +1)  :=  ( H  (n  ),  U  (n  ),  A  (n  ),  B  ( n  ))  n=l,2,...(2.1) 

and  which  take  values  in  the  corresponding  information  spaces  { Mn  where  II y— 1NAt1 
and  Hn  XB  XC\'A  _1X  for  all  n  —1,2,.... 

These  quantities  have  a  ready  interpretation  in  the  context  of  the  situation  described  in 
the  introduction:  The  number  of  customers  initially  in  the  k-th  queue  is  set  at  E^  and  for 
each  n  =1.2,...,  the  state  of  the  system  is  represented  by  a  RV  X {n  )  of  integer  components 
with  the  interpretation  that  at  the  beginning  of  the  slot  [n  ,n  +l),  (n )  customers  are 
stored  in  the  k-th  buffer,  including  the  one  receiving  service.  Thus  at  that  time, 

(i) :  control  action  V(n)  is  selected  with  the  convention  that  Uk{n)—\  (resp. 
Uj.  (n  )=0)  if  the  k-th  queue  is  (resp.  is  not)  given  service  attention  during  that  slot; 

(ii) :  new  packets  arrive  into  the  system  according  to  the  RV  A(n  )  in  that  Ak(n)  new 
customers  join  the  k-th  queue,  and 

(iii) :  completions  of  transmission  are  encoded  in  the  binary  RV  B{n)\  here  Bk  (n  )=1 
(resp.  Bk{n)— 0)  signifies  successful  completion  (resp.  abortion)  of  service  for  the  k-th  queue 
conditioned  on  it  being  served. 
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As  a  result,  the  successive  system  states  or  queue  sizes  form  a  sequence  {Ar  (n  )}100  of 
IN  r  -valued  RV’s  which  are  generated  componentwise  through  the  recursion 

Xjt  (n  +1)  =  Xk(n  )  +  Ak  (n  )  -  I  [Xk  (n  )^0]  Uk  (n  )Bk  (n  )  ,  0<k  <K ,  n=l,2,...(2.2) 

with 

At  the  beginning  of  each  time  slot  [n  ,n  +1),  the  channel  controller  has  access  to  the  ini¬ 
tial  queue  sizes  E,  the  past  arrival  pattern  A(i),  1  <t<n,  the  past  decisions 
U(i),  1<»  <n,  and  the  service  history  B(i),  1  <i  <n  .  Thus,  the  channel  controller  has 
knowledge  of  the  RV  H  (n  )  which  is  used  to  generate  the  control  value  U  (n  )  implemented 
in  the  slot  [n  ,?i+l).  The  selection  of  this  control  value  is  done  according  to  a  prespecified 
mechanism,  which  may  be  either  deterministic  or  random. 


The  probabilistic  structure 

Since  randomized  strategies  are  allowed,  an  admissible  control  policy  z  is  defined  as  any 

collection  }“  of  mappings  7r„  :  JHn  — < -S ,  with  the  interpretation  that  at  times  n  —1,2 . 

the  k-th  queue  is  given  service  attention  with  probability  7r„  ( k  ■,hn  )  whenever  the  information 
vector  hn  is  available  to  the  system  controller.  Denote  the  collection  of  all  such  admissible 
policies  by  n. 

For  each  ?i=l,2,....  let  Fn  denote  the  cr-field  on  the  sample  space  fi  generated  by  the 
RV  H {n  ),  with  Fn  C Fn_i. 

Let  qs{‘)  and  q  (•)  be  two  probability  distributions  on  IN^1,  with  q  (0)  <  0,  and  fix  a 
service  rate  vector  /i  in  (0.l]^’+1.  The  model  is  now  completely  specified  by  postulating  the 
existence  of  a  family  {Pz,  ~  £  n}  of  probability  measures  on  the  a-field  F  which  satisfies 
the  requirements  (Rl)-(R3)  below,  i.  e.,  for  every  policy  z  in  n, 

(Rl):  For  ail  x  in  ES'A  _1; 

p-[  2=ar]  :=  q*{x), 


(R2):  For  all  a  in  ES'A  1  and  b  in  {0,1  +1, 

P r\A  (n  )—a  .B  (n  )=b  \  Fn  v  a{U  (n  )}}:—P  Z{A  (n  )=a  }PZ[B  (n  )=6  ] 

K 


9(a)^no(  bk  pk  +  {l~bk  ){l-pk  )  j 


n=l,2,... 


and 


(R3):  For  all  ek  ,0 <k  </v  ,  in  B  , 

P  ~{U  {n  )=et  |  Fn  ]  :=  Pz[U(n  )=ek  \  Hn]  :=  7r„  (k  ;  Hn  ). 


n=1.2,.. 


The  existence  of  a  sample  space  (n,F)  that  carries  such  a  family  of  probability  meas¬ 
ures  {P7\  ~  €  n}  is  easily  established  via  the  Kolmogorov  extension  theorem,  by  taking  fi  to 


be  the  canonical  space  : 


:=ysk+1x  {b  xixa'+1x{o.i}a"+1  j 


equipped  with  its  natural  a- 
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field.  This  modelling  approach  for  the  Markov  decision  process  under  consideration  was 
adopted  in  [21]  to  which  the  reader  is  referred  for  additional  information. 

The  reader  will  readily  check  that  under  each  probability  measure  P%,  the  following 
properties  hold  true. 

(PI):  The  fNA  +1-valued  RV  S  and  the  sequences  of  R.V’s  {A  (n  )}j°°  and  {B  (n  )}1°°  are 
mutually  independent ; 

(P2):  The  sequences  {Bk  (n  J})00  of  {o,l}-valued  RV’s  are  mutually  independent  Bernoulli 
sequences  with  parameters  pk  ,  0<k  <K ; 

(P3):  The  INA +1-valued  RV’s  (.4  ( n  )}]°°  form  a  sequence  of  i.i.d  RV’s,  with  a  common 
distribution  q  (•);  and 

(P4):  The  probability  transitions  have  the  form 

Pz[X(n+l)=y  |  Fn  }=p  (X(n  ),y  ;  irn(H (n  )))  ,  (2.3) 

where,  for  all  q  in  S , 

K 

p(x,y,q  )  :=  £  qk  Qk(x  ,y)  ,  (2.4) 

as  x  and  y  range  over  1NA  +1,  with  the  definitions 

Qk  (x  >2/  )  :=  P*[*k  AAk  («  W  \xk  7^0 )Bk  (n  )=yk  :  Xj  -j-Aj  ( n  )=«//  ,  0<j  <K)  (2.5) 

for  all  0 <k  <K .  Note  that  the  right  handsides  of  (2.5)  are  independent  of  n  and  of  the  pol¬ 
icy  7r  owing  to  the  assumptions  made  earlier. 

For  future  notational  use,  it  will  be  convenient  to  assume  that  the  sample  space  ft  car¬ 
ries  an  additional  IN^+1- valued  RV  A  (oo)  which  is  distributed  according  to  q  (•)  and  is 
independent  of  all  other  basic  RV’s  introduced  so  far,  under  the  probability  measure  associ¬ 
ated  with  any  policy  it  in  n. 

Several  families  of  policies 

Several  subclasses  of  policies  within  n  will  be  of  interest  in  the  sequel. 

A  policy  7r  in  II  is  said  to  be  a  Markov  or  memoryless  policy  if  there  exists  a  family 
{Sn  }i°°  of  mappings  gn  :  E\'K  "rl— »-5  such  that 

rn(ff(n))=gn(X(n))  P*-a.s.  n=l,2,...(2.6) 

with  {X (n  llj00  generated  through  the  recursion  (2.2).  In  the  event  all  the  mappings  {gn 
are  identical  to  a  given  mapping  g  :  JSK+1-*S ,  the  Markov  policy  it  is  termed  stationary  and 
can  be  identified  with  the  mapping  g  itself,  as  will  be  done  repeatedly  in  the  sequel. 

A  policy  tr  in  n  will  be  said  to  be  a  pure  strategy  if  there  exists  a  family  {/„  Jq00  of 
mappings  /„  :  JHn  —+JB  such  that  for  all  0<k  XK  , 
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7T  n(k-,H(ri))  =  8(ekJn(H(n))),  P*-a.s.  n«i,a,...(2.7) 

A  pure  policy  tt  can  thus  be  identified  with  the  sequence  of  deterministic  mappings  {/„  }“. 
A  pure  Markov  stationary  policy  n  in  n  is  thus  fully  characterized  by  a  single  mapping 
/  :  IN*-  +1— to  which  it  is  substituted  in  the  notation. 

A  policy  7T  in  II  is  said  to  be  non-idling  or  work-conserving  whenever  for  all  0 <A  </v  , 
the  condition 

7rn  (k  ;  H  (n  ))  >  0  implies  either  Xk(n)y£o  or  X(n)= 0  n=l,2,...(2.8) 

holds  true  P7-a.s.  ,  in  which  case,  the  P7-a.s.  equality 

K 

E  Uk  (»  )I  [  Xk  (n  )=^0  ]  =  1-/  [  X [n  )=0  i  n=l,2,...(2.9) 

k  =0 

necessarily  follows. 


3.  CONVERGENCE  FOR  THE  LONG-RUN  AVERAGE  COST: 

Let  X k  be  the  first  moment  of  the  sequence  of  i.i.d  RV’s  {Ah(n  )}1cc,  0 <k  <K ,  and  for 
future  use,  define  the  traffic  coefficient  p  to  be 

* 

S— •  (3-D 

k  =0  k-k 

Throughout  this  paper,  the  discussion  is  carried  out  under  the  assumption  that  p  <  1,  which 
expresses  stability  of  the  queueing  system  under  any  non-idling  policy  (as  discussed  in  Sec¬ 
tions  4-5). 

Let  c  denote  a  mapping  ISt/c+1-+IR  and  for  any  admissible  policy  7r  in  n,  pose 

/(tt)  :=  Urn  —  £*E  c  (X(i))  (3.2) 

n  Too  n  ■  =1 

with  the  usual  interpretation  that  the  quantity  /(tt)  is  a  measure  of  system  performance 
when  the  policy  tt  is  in  use.  Analysis  often  identifies  a  Markov  stationary  policy  g  in  n 
which  exhibits  suitable  performance  properties  with  respect  to  the  cost  function  (3.2).  Vari¬ 
ous  examples  are  now  discussed  in  some  detail  so  as  to  motivate  the  developments  of  the 
paper. 

Some  examples 

^  i  j  • 

When  the  cost-per-stage  is  linear,  i.e..  for  all  x  in  IN 

K 

c(f)  =  E  ckxk  .  (3-3) 

k  =0 

with  ck  >0,  0<A-  <K  .  several  authors  '2.  3.  4]  showed  that  the  pc- rule  minimizes  (3.2)-[3.3) 
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over  the  class  of  all  admissible  policies  n.  Here  the  fie  -rule  is  the  non-randomized  Markov 
stationary  policy  g  given  by 

g  (k  ,x  )  =  1  if  %i  =0  for  0</ <k  and  xk  7^0  ,  0 <A;  </v  (3.4) 

for  all  z  7^0  in  INA +1,  under  the  convenient  assumption  that  ^oco^Mici^  ••••  >llKcK-  It> 
is  clear  that  implementation  of  this  policy  is  conditioned  on  knowledge  of  the  rate  parame¬ 
ters  ,  0<k  <K.  The  situation  where  such  knowledge  is  not  available  was  studied  by 
Baras,  Dorsey  and  Makowski  [1]  through  use  of  the  Certainty  Equivalence  Principle;  the  pro¬ 
posed  adaptive  //c- rule  a  based  on  the  maximum  likelihood  estimates  for  the  rate  parame¬ 
ters  was  shown  to  be  optimal  in  that  J(a)=J(g  ). 

For  each  V  in  IR,  consider  now  the  set  Fly  of  all  admissible  control  policies  in  n  which 
satisfy  the  constraint 

J(ir)  <  V.  (3.5) 

With  the  interpretation  given  earlier  for  the  quantity  J (it),  desirable  system  behavior  may 
then  be  conveniently  expressed  in  the  form  of  a  constraint  (3.5),  in  that  only  admissible  poli¬ 
cies  in  Tlv  need  to  be  considered  when  running  the  system.  This  viewpoint  was  taken  by 
Nain  and  Ross  [14]  in  the  study  of  a  simple  problem  of  channel  allocation.  There,  as  in  many 
other  Markov  decision  problems  with  constraints  [17]  ,  Lagrangian  arguments  often  reduce 
the  search  of  a  constrained  (optimal)  policy  to  finding  a  Markov  stationary  policy  g  in  II 
that  saturates  the  constraint,  i.e. , 

J(g)=v.  (3.6) 

It  should  be  pointed  out  that  this  last  problem  is  of  independent  interest  for  it  can  be  viewed 
more  generally  as  one  of  steering  the  cost  (3.2)  to  a  particular  value  V  determined  through 
various  design  considerations. 

This  line  of  arguments  typically  proceeds  by  identifying  two  Markov  stationary  policies 
in  n,  possibly  randomized,  say  /  0  and  /  \  with  the  property  that 

/(/  °)  <  V  <  J(f  !).  (3.7) 

It  then  remains  to  construct  from  the  policies  /  0  and  /  1  an  admissible  policy  g  in  IT  that 
satisfies  (3.6).  This  construction  is  often  achieved'  by  randomizing  between  the  policies  /  0 
and  /  *.  To  that  end,  consider  for  any  7/  with  0<t?<1,  the  randomized  Markov  stationary 
policy  /  v  with  bias  7/  generated  through  the  mapping  /  11  :  IN A  ^1— *-5  where 

f\x)~r1f°(x)+vJ1(x)  (3.8) 

for  all  x  in  INA+1.  Note  that  for  77=0  (resp.  77=1),  /  ''  is  identical  to  the  original  policy  /  0 
(resp.  /  x).  If  the  mapping  77— *J{f  ’’)  is  strictly  monotone  and  continuous  on  the  interval 
[0,1],  then  exactly  one  randomized  strategy  /  v  meets  the  constraint,  and  its  bias  value  77*  is 
the  unique  solution  of  the  equation 
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J (f  V)  =  V ,  T]  in  [0,1],  (3-9) 

* 

whence  the  identification  g  —  /  v  may  take  place. 

The  determination  of  the  optimal  bias  value  q*  seemingly  requires  the  evaluation  of  the 
expression  /(/  n)  for  all  values  of  ij  in  the  unit  interval  [0,1].  This  is  a  non-trivial  task  even 
in  the  simplest  of  situations  when  K  =1  and  the  policies  /°  and  f1  are  static  priority 
assignments  as  was  the  case  in  the  model  discussed  by  Nain  and  Ross  [14]  .  As  already 
pointed  out  there,  for  0<j;<1,  the  two  competing  queues  can  be  interpreted  (under  the  pro¬ 
bability  measure  Pv  associated  with  the  policy  f  v)  as  a  two  processor-sharing  system  of  the 
type  studied  by  Fayolle  and  Iasnogorodski  [7]  .  The  computation  of  the  probability  generat¬ 
ing  function  in  equilibrium  reduces  to  the  solution  of  a  Riemann-Hilbert  problem,  from  which 
the  optimal  bias  ti*  could  in  principle  be  determined  numerically  as  a  function  of  the  arrival, 
statistics  q  (•)  and  of  the  rates  vector  /i.  Moreover,  there  has  been  no  success  in  extending 
these  methods  to  the  higher  dimensional  case  I(  >2.  In  short,  even  if  the  statistical  model 
parameters  were  available  to  the  decision-maker,  the  evaluation  of  rj*  seems  to  constitute  a 
formidable  computational  undertaking  [14]  . 

To  circumvent  these  difficulties,  it  seems  natural  to  find  alternate  implementations  of 
the  randomized  policy  g  identified  through  the  analysis.  To  fix  ideas,  suppose  both  policies 
/  0  and  /  1  to  be  implementable.  The  very  definition  of  the  policy  /  11  naturally  suggests 
schemes  where  a  sequence  of  (0,l]-valued  RV  {rj(n  )}i<x,  acting  as  estimates  for  the  optimal 
bias  value  i]* ,  are  substituted  for  it  in  the  definition  (3.8).  The  adaptive  non-idling  policy  a 
can  be  formally  defined  through  the  mappings  {an  with 

a„  (H(n  ))  :=  f  )).  n=l,2,...(3.10) 

Needless  to  say,  the  estimates  {ij(n  j}]00  need  to  be  generated,  possibly  via  some  recursive 
algorithm,  so  as  to  ensure  that  the  corresponding  policy  a  meets  the  constraint  (3.6). 

An  example  of  such  a  scheme  was  proposed  by  Shwartz  and  Makowski  [21]  for  the  two 
competing  queues  situation  treated  by  Nain  and  Ross  [14]  on  the  basis  of  well-known  ideas 
from  the  theory  of  Stochastic  Approximations,  of  which  the  Robbins- Monro  scheme  is  the 
archetypical  example  [16]  .  The  key  idea  being  to  solve  on-line  the  constraint  equation  (3.6), 
the  proposed  scheme  generates  a  sequence  of  bias  values  (r/(n  )}1co  through  the  recursion 

r/(n  4-1)  =  j\(n)  -  «„  (  V  -  c  {X  (n  +1))  )  jQ‘  n=l,2,...(3.11) 

with  r?(0)  given  in  [0.1],  the  convention  being  that  [z]01:=Oy  (x  A  1)  for  all  x  in  IR.  As  with 
most  stochastic  approximation  algorithms,  the  step  sizes  {an  jj00  form  an  IR+-valued 
sequence  which  satisfies  the  conditions  ■ 

CO  CO 

0<fln.-»0,  E  an  =  03  -  S  i  <  30-  (3-12) 

n  =1  rc  =  1 

In  [20]  the  authors  show  that  for  the  problem  at  hand,  the  policy  a  is  optimal  in  that 


10 


J(a)=J(g). 

A  general  convergence  result 

The  two  examples  discussed  above  can  both  be  accommodated  into  the  following  gen¬ 
eral  framework:  Let  j  be  a  Markov  stationary  policy  in  fl  held  fixed  hereafter,  and  consider 
an  admissible  policy  a  in  n  to  be  an  implementation  of  it.  The  question  of  interest  here  can 
be  formulated  as  one  of  finding  natural  conditions  under  which  J  (a)  —  J{g). 

To  that  end,  it  will  be  convenient  to  say  that  an  admissible  policy  a  in  11  satisfies  the 
convergence  condition  (CJ  /with  respect  to  g  )  if 

(C):  The  RY’s  {q,,  (H  (n  ))  ~  g  (X ( n  ))}“  converge  to  0  in  probability  under  Pa,  i.e. ,  for 
every  e>0, 

lim.P0'  |  on  (k  ,H  (n  ))  -  g  (k  ;X  (n  ))  |  >e,  0<k  </v  —0. 

n  *co  L 

This  paper  is  devoted  to  the  study  of  the  performance  properties  of  admissible  policies 
a  satisfying  the  convergence  condition  (C).  To  introduce  the  necessary  hypotheses,  it  is  con¬ 
venient  to  define  the  mapping  Z :  ]isK  +  i— >IR  +  given  by 

K  Xi. 

z  (x  )'■—  s  -  (3.14) 

k  =o  lLk 

for  all  x  in  'rl.  The  main  result  will  be  derived  under  the  following  technical  conditions 
(Hl)-(H4),  where 

(HI):  The  RV’s  {Z  (A"(n  ))}100  are  uniformly  integrable  under  the  probability  measure 

p  3 ; 

(H2):  The  RV’s  {Z {X (n  ))}f°  are  uniformly  integrable  under  the  probability  measure 

pa; 

(H3):  The  growth  condition 

f  CC 

Ea  ^ 

V  = 

holds; 

(H4):  The  RV’s  {c  (-V  (n  ))}i°°  are  uniformly  integrable  under  the  probability  measure 
P 9  ,  and 

(H5):  The  RV’s  {c  (.V  (n  ))}1<X>  are  uniformly  integrable  under  the  probability  measure 
Pa .  . 

Theorem  3.1.  Under  the  foregoing  assumptions  (Rl)-(RS),  whenever  the  conditions  ( Hi) - 
(115)  are  enforced,  and  the  policy  a  satisfies  the  convergence  condition  tC)  with  respect  to  the 
non-idling  policy  g  ,  the  convergence 
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J  (a)  =  lim  —  S  c(X  {i ))  =  J(g) 

n  -*oo  Tl  •  _j 

takes  place  in  L  1(Cl,F  ,Pa ),  whence 

J{a)=  lim  —EaYi  c  (X(i))  =  J{g). 

n  -*00  n  : 


(3.15) 


(3.16) 


Proof:  The  result  follows  readily  from  Theorem  7.1. 


□ 


At  this  stage,  the  reader  may  wonder  as  to  how  easy  it  is  to  verify  the  conditions  (Hl)- 
(H5)  from  the  basic  data  of  the  problem.  Sufficient  conditions  for  establishing  (H1)-(H5)  are 
now  given  in  the  form  of  the  additional  requirements  (Rd)  and  (R5)  on  the  data  of  the  prob¬ 
lem,  namely 


(R4):  There  exists  some  constant  q>  1  such  that  for  every  policy  tt  in  n,  the  moment 
conditions 


E * 


K 


E  I  -k 


K 


E  \*t 


qB{x  )  <  oo 


and 


=0 

J 

z  e  jsk+1  [k  ■ 

=0  J 

K 

- 

K 

ET- 

E  U*(») 

r 

=  s 

E  1  “J1 

k  =0 

a  €  1Nk'+1 

k  =0 

q  (a  )  <  oo  n=l,2, 


hold  true. 

Moreover,  the  mapping  c  is  assumed  to  satisfy  the  following  growth  condition  (R5),  where 
(Ro):  There  exists  constants  <5>0  and  L  >0  in  IR  such  that 

I  c  (x  )  |  <  L  (1+  |  *  |  5) 

for  all  j  7^0  in  1N^  Tl. 


Theorem  3.2.  Assume  the  policies  g  and  a  to  be  non-idling.  Under  the  foregoing  assump¬ 
tions  ( Rl)-(R5 ),  assumptions  (Hl)-(H5)  are  satisfied  whenever  7  is  such  that 

max{3,  H-<5(l+e)}  <  7  (317) 


for  some  e  >  0. 


Proof:  The  bound 

0  <  Z(T)  <  (  min  Uk  V1  I  x  |  ,  (3.18) 

(0<fc  <K  J 

is  obviously  valid  for  all  1  in  JNK+1,  whereas  Theorem  6.1.  gives 
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sup  E * 

n 


X(n)|^ 


<  oo 


(3.19) 


for  any  non-idling  policy  77  in  n,  since  condition  (3.17)  implies  7-  1>2.  The  validity  of  the 
assumptions  (Hl)-(H3)  is  now  immediate  upon  combining  remarks  (3.18)-(3.19).  To  obtain 
(H4)-(H5),  it  suffices  to  note  that  (3.17)  implies  (6.4)  and  to  apply  Corollary  6.1.1.  r-i 


An  operational  version  of  the  main  convergence  result  of  Theorem  3.1  can  now  be  given. 

Theorem  3. Ibis.  Assume  the  policy  g  to  be  non-idling.  Under  the  foregoing  assumptions 
(Rl)-(R5)  with  (8.17),  any  non-idling  policy  a  in  n  which  satisfies  the  convergence  condition 
(C)  with  respect  to  g  has  the  property  that 

•/(q)  =  lim  —  Ea  2  c  (-V(i'))  =  J (g  )•  (3.20) 

■  -  *  n  *  CC  11  : _ , 


Proof:  The  result  follows  readily  from  combining  Theorems  3.1  and  3.2. 


4.  PASSAGE  TIMES  TO  THE  EMPTY  STATE: 

Throughout  this  section,  let  tc  be  a  fixed  non-idling  (not  necessarily  Markov)  policy  in 
n.  The  results  given  below  extend  to  randomized  policies  some  of  the  results  obtained  by 
Baras,  Dorsey  and  Makowski  (  [1]  ,  Section  4)  for  non-randomized  policies  only.  Here  too  the 
discussion  focuses  on  generating  exponential  or  Wald  martingales  which  are  rich  enough  to 
yield  statistical  information  on  various  passage  times  for  the  queue  size  sequence  {X(n)}1co. 
The  quantities  of  interest  include  first  passage  times  to  the  empty  state  (Theorem  4.1)  and 
first  exit  times  from  the  empty  state  (Lemma  4.2)  in  order  to  study  the  busy  and  idle  periods 
of  such  a  system. 

At  this  stage,  it  is  convenient  to  introduce  the  sequence  of  {0,l}A  "^-valued 

RV’s  which  are  defined  componentwise  by 

V-K{n)  :=  I ;  Xk  (n  )^0  }Uk  (n  ),  o <k  <K  ,  n=l,2,...(4.1) 

Note  that  with  this  notation,  the  relation  (2.9)  (valid  for  the  non-idling  policy  ir)  readily 
translates  into  the  identity 

K 

Yf.(n  )  =  I  [  X  {n)y^O  }  P*-a.s  n=l,2,...(4.2) 

c  =0 

The  next  proposition  is  essential  for  generating  a  (Pr' ,Fn  )-martingale  of  interest:  it 
presents  a  variation  of  a  basic  relation  already  given  by  Baras,  Dorsey  and  Makowski  in  1;  . 
For  ease  of  notation,  let  {F.(}™  denote  the  filtration  on  the  sample  space  n  defined  by 
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F*  •=  Fn  y  a{U(n  )}  =  a{H (n  ),U ( n  )}.  n=i,2...(4.3) 


Lemma  4.1.  Under  the  foregoing  assumptions  (Rl)-(R4),  for  every  z  in  (0,1]^  +1,  the  equal¬ 
ity 


E7 


K 

n  zk 


Xt  (n  +1) 


F* 


K 

a{z)  U  bk  {zk  ) 


Vt(n) 


K 

n  ?k 


Xt(n) 


11=1,2,. ..(4. 4) 


holds  true .  with 


a(z) 


E7\ 


K 

n  zk 
k  =0 


At(n) 


(4.5a) 


and 


M*t)  :  =  E7{  zk  ~B‘in 


Zk 


+  (1_f'i  )  - 


0<h  <K. 


(4.5b) 


Note  that  the  right  handsides  of  (4.5)  depend  neither  on  the  policy  tt  nor  on  the  time  index 
n  owing  to  the  assumption  (R2). 


Proof:  Upon  substitution  of  the  the  system  dynamics  (2.2)  into  the  left  handside  of  (4.4), 
easy  calculations  lead  to  the  relation 


E7 


K 

n  zk 
1=0 


*i(»+D 


n=l,2.... 


=  n  s 

k  =0 


_  *i(»)  t 


K 

n  zk 
1=0  * 


At{n)-vk  Bt(n) 


F„ 


v  =  V{n) 


where  the  f7,, -measurability  of  the  RY’s  X (n  )  and  V (n  )  has  been  used. 


(4.6) 


Owing  to  the  assumption  (R2)  imposed  on  the  probability  measure  P*,  the  mutually 
independent  RV’s  A  (n  )  and  B  (n  )  are  also  seen  to  be  independent  of  the  a-fleld  Fn  .  Conse¬ 
quently,  for  all  v  in  {0,l}A~\  the  relation 


E 7 


1  =0 


=E7 


K 

n  zk 
1  =0 


•M") 


n=l,2....(4.7) 


holds  true,  with  the  mutual  independence  of  the  components  of  the  RY  B  (n  )  being  used  to 
get  the  product  form  of  the  second  factor.  Substitution  of  (4.7)  into  (4.6)  readily  yields  (4.4) 
with  the  notation  (4.5).  □ 
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In  view  of  (4.2),  the  relation  (4.4)  is  expected  to  simplify  whenever  all  K  +1  factors 
bk  {zk ),  0<fc  </v  are  equal.  To  that  end,  note  that  each  one  of  the  mappings 
zk  ~*bk  ( zk  )>  0 <&  <K  ,  decreases  monotonically  from  +  oo  to  1  on  the  interval  (0,1].  Hence, 
for  each  b  in  the  interval  :1,  +  oo),  there  exists  exactly  one  element  in  (0,l]A+1  with  the  pro¬ 
perty  that 

=  —  -  (1  ~nk)  =  b  .  0 <k<K.  (4.8) 

zk 

In  fact,  a  simple  calculation  shows  that  the  components  of  this  element,  denoted  throughout 
by  2  (6  ),  are  given  by 


(6)  = 


hk 


l‘k~b~  1 


,  o  <k<K. 


(4.9) 


Let  {L  (n  )}1co  denote  the  IN-valued  RY’s  that  count  the  time  (expressed  in  slots)  spent 
in  the  empty  state,  i.  e.. 


L(n-l)  :=  g/[X( 0=0 


n=l,2,...(4.10) 


i  =i 


with  L  (1)  :=  0.  For  every  b  in  [1,  +  oc),  the  1R  “  -valued  RV’s  {M (n  ,b  )}1:o  are  defined  by 

n  zk(b'Xi{n) 

M(n,b ) 


n  zk(b)  1  f411, 

,t=o  bL{n]  l4-11) 


r(b)n 


where  for  notational  convenience, 


r  {b  )  :=  a  (z  (b  ))*6 


(4.12) 


Proposition  4.1.  Under  the  foregoing  assumptions  (Rl)-(R4),  the  RV’s  {M {n  ,b  )}100  form 
an  integrable  positive  {P'  .Fn  )-martingale  for  every  b  >1. 


Proof:  The  int egrability  of  the  R^'’s  {M(n,b)}f°  is  readily  established  from  the  easy 
bounds 


0  <  M(n  ,b  )  <  bL(n)-  < 
~  r(b)n  - 


b 

r(b) 


(4.13) 


valid  for  all  n=l,2,...  and  b  >1. 

For  every  b  in  the  interval  [l,  —  oc),  the  relations  bk(zk(b  ))=b  ,  0 <k  </v  hold  true 
by  the  very  definition  (4.9)  of  z(b  ),  and  it  is  now  plain  that 
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K  EM") 

n  bk(zk(b))Vt{n)  =  bk=0  —  b 

k  =0 

where  the  last  equality  is  obtained  with  the  help  of  (4.2). 
(with  z  —z  (6  ))  implies 


-/[X(n)=0]  11=1,2, ...(4. 14) 

Substitution  of  (4.14)  into  (4.4) 


E7' 


K 

n  zk  (b  y 

k  =0 


-1) 


F* 


K 


a(z(b))bl-I[X{n)=0]  n  zk(b) 

k  =0 


Xt(n) 


(4.15) 


and  the  RY  X (n  )  being  Fn-  measurable  with  Fn  C.F*,  an  elementary  smoothing  argument 
now  yields 


E 7 


n  zk{b)Xt(n^l)\Fn 

k  =0 


r(6) 


K 

n  zk(b) 

k  =0 


Xk(n  ) 


bI[. V(n)=0l 


n  =  l,2....(4.16) 


The  martingale  property  of  the  sequence  {M(n  ,b  )}1co  is  now  immediate  from  (4.16). 


The  particular  structure  of  the  martingales  {M (n  ,6  )}100,  b  >1,  is  now  exploited  to 
obtain  information  on  the  first  passage  times  to  the  empty  state.  To  that  end,  if  a  is  any 
arbitrary  Fn  -stopping  time,  let  v(a)  be  the  IN-valued  RV  defined  by 

i/(cr):=  inf  { n  >  1:  X(cr+n  )=0}  if  a< oo,  (4.17) 

with  the  convention  that  iv(cr)=oo  whenever  this  set  is  empty  or  when  <r=oo;  the  RV 
r(cr)  :=  a  —  u{cr )  is  clearly  an  Fn  -stopping  time. 


Theorem  4.1.  Under  the  foregoing  assumptions  (R1)-(R4),  the  conservation  law 


Er. 


I  [cr<oc,r((T)<ooj 


1 


I  F 

r(b)X)1  * 


K 


I  [cr<oo]  n  zk  ( b  )Xk  {a)b  - 1  |A'(<7)=o: 
jfc  =o 


(4.18) 


holds  true  P"-a.s.  for  all  b  >1. 


Proof:  See  Appendix. 

An  immediate  consequence  of  this  result  is  stated  in  the  following 

Corollary  4.1.1.  Under  the  assumptions  of  Theorem  4.1.,  the  relation 

Pz  cr <oo,  v(a)<oc  \  F a\  —  I  [u <co]  P'-a.s.  (4.19) 

holds  true,  and  in  particular,  if  cr<o c  Pn-a.s.  ,  then  necessarily  v(o)Oc  Pz-a.s. 


Proof:  The  events  [cr<oc.  7f»<oo]  and  [cr<oo,  i/(cr)<oo]  coincide,  and  the  result  (4.19)  fol¬ 
lows  readily  by  letting  the  variable  b  go  to  1  monotonically  in  (4.18)  and  using  the  Monotone 
Convergence  Theorem  for  conditional  expectations.  The  second  part  of  the  corollary  is  now 
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immediate. 


□ 


As  the  forthcoming  discussion  will  show,  the  conservation  law  given  in  Theorem  4.1  can 
be  used  to  study  the  structure  of  the  busy  cycles  of  the  chain  {X(n)}1M.  To  that  end,  for 
any  Fn  -stopping  time  a ,  define  the  IX-valued  RV  /3(a)  by 

3(a)  :=  inf  {  n  >0:  A  (a+n  )+^0  }  if  a<oc,  (4.20) 

with  the  convention  that  3(a)— oo  whenever  this  set  is  empty  or  when  a=co:  the  RV 
7 (a)  a  —  3(a)  is  clearly  an  Fn  -stopping  time.  Several  useful  properties  of  the  RY’s  3(a). 

7 (a)  and  A  (~;(a))  are  now  given  for  easy  reference. 


Lemma  4.2  Under  the  foregoing  assumptions  (Rl)-(Rj),  for  every  r  >1  and  every  z  in 
(O.lf'1,  the  relation 


Er. 


I  !a<oc.i. 


.3(a)<oc}r  ~  ~'c)  n  \  F 

k  =0 


rS«i*)^jWl  /free 

r  -  q  (0) 


(4.21) 


holds  true  P'  -a.s.,  and  consequently 

P~'a<oo,/3(a)—l  \Fa]  =  I{a<oc)q(0)l[l-q(0y.  1=0,1, ...(4.22) 


In  the  particular  case  when  a<co  PT-a.s  ,  Lemma  4.2  thus  implies  that  the  RV  3(a)  is 
also  P~-a.s.  finite,  has  a  geometric  distribution  with  parameter  q  (0)  and  is  independent  of 
the  RV  >1  (7(a));  moreover  these  two  RY’s  are  (jointly)  independent  from  the  cr-field  F c. 


Proof:  It  is  plain  from  the  definition  of  the  RV  3(a)  that  for  all  /  =0,1,..., 

[3(a)  =  /  ]  =  [  A  (a+j  )=0,  0<;  </ ;  A  (a+l  )^0  j.  (4.23) 

For  l  7^0  in  IN,  a  smoothing  argument  using  the  inclusion  F  aC.F  leads  to  the  chain  of 

equalities 


=E * 

=E* 


Ez 

K  4 

I  [c7<oc,  3(a)=l }  II  zk  ‘  \Fa 

k  =0 

(  Tl  I  A  (a+j  )=0  )  I  'A  (a+l  )^0]  n  {°+‘  ‘  \ 

{o<j<i  '  J  1=0 

F„ 

(  n  I  A(a+j)=0')Ez 

[o<j<l  J 

K  A.  {<7-1  )  , 

I  [A  (a-l)y^O.  n  zk‘  1 

k  =0 

F  a+l- 1)  j  1  F  g 

(4-24) 


Under  Pz,  the  RV  .4  (a+l)  is  distributed  according  to  the  common  distribution  q  (•)  of 
the  i.i.d  sequence  {.4  (n  i},x  and  is  clearly  independent  of  the  cr-field  Fc  _(/_i y  These  facts 


are  readily  established  from  the  properties  (P1)-(P4),  which  also  imply 


P*[A  (a+n  )=0]  =  q  (0). 


n=0,l . (4.25) 


It  is  now  clear  that 
En 


I[a<oo\I{A{cr+l)^0}  n  z£t[<7+l)  |  F a+{l_1} 

k=  o 


=  /  [crCooJ'a  (2  )  -  q  (0)]  (4.26) 


Substitution  of  (4.26)  into  (4.24)  easily  implies  that 


E7 


I  a  <00,  j3(a)—I  ]  n  Zb 
k  =0 


At  (-y(er)) 


F, 


(4.27) 


=  I  a<oo 


{  n  P*[A  (<r+y  )=o’  \ 

(0  <j<t  ) 


•4  (<T+j  )=0)  )  [a(z)~  q  (0); 


by  making  use  of  the  F  ^-measurability  of  the  event..  xrCoo]  and  by  noting  that  the  RY’s 
{A  (cr-i- A:  ).0 <k  <1  }  are  jointly  independent  (under  P~)  from  the  cr-field  F  a  as  a  consequence 
of  (P1)-(P4).  The  relation 

K 


E7'[I{a< oc.  3(a)=l]  n  zk 
k  =0 


A,  (•>(*)) 


I  F„\  =  I  [a«x>\q  (0)'  [a  (z)-  q  (0)]  (4. 28) 


obtains  by  direct  inspection  upon  substituting  (4.25)  into  (4.27).  The  reader  will  readily 
check  by  arguments  using  earlier  remarks  that  (4.28)  also  holds  for  /  =0,  whence  the  conclu¬ 
sions  (4.21)  and  (4.22)  are  now  immediate  by  elementary  calculations.  q 


5.  PROPERTIES  OF  BUSY  CYCLES: 

This  section  is  devoted  to  the  study  of  the  busy  cycles  when  the  system  is  operated 
under  a  fixed  non-idling  policy  n.  To  that  end,  consider  the  following  collections  of  IN-valued 
RV’s,  namely  {r„  jj00,  {7,  }f°  and  {i/n  jj00,  whose  definitions  and  interpretations  are  now 
presented.  First,  pose 

Tj  :=  u{l)I[E^0\+I  [H=0]  (5.1) 

and  recursively  define 

rn  :=  o(rn  1-1  =  t„  +  3{t„  )+l  n=l,2,...(5.2a) 

»„  :=  v(jn  )  n=l,2,...(5.2b) 


and, 


'n+l  •  'C'"n  )  '  71  ~t~  ^  n 


n— 1,2,..  .(5.2c) 


where  the  definitions  (4.17)  and  (4.20)  are  used.  Note  that  Tn  is  the  n  -  th  time  epoch  at 
which  the  queueing  system  empties,  and  7n  represents  the  first  time  after  t„  that  the  system 
is  again  not  empty.  Since  the  interval  [rn,Tn+1)  can  be  viewed  as  the  n -th  busy  cycle,  it  is 
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natural  to  consider  the  ENT-valued  RV’s  { 6n  }j°°  defined  by 

8n+ 1  :=  Tn+i  -  Tn  =  )+u(fn  )  n=l,2,...(5.3) 

with  6X  Tj.  It  is  clear  that  l+/?(rf!  )  and  u(jn  )  are  the  lengths  of  the  n  -th  idle  period  and 
n  -th  busy  period,  respectively,  whereas  6n  +1  is  exactly  the  length  of  the  n  -th  busy  cycle. 

The  RV’s  {r„  }{c  and  {?„  j,03  are  Fn  -stopping  times  with  the  property  that  for  all 
n  =1,2 . 

X (rn  )  =  0  on  the  event  fr„  <oo]  (5.4a) 

and 

X  (r„  )  =  A  h(rn  ))  on  the  event  [rn  <oo,  j3{rn  )<oc  .  (5.4b) 

Moreover,  owing  to  Corollary  4.1.1  and  to  the  remarks  following  Lemma  4.2.  it  is  easy  to  see 
recursively  that  the  RV’s  r,  .  ,3(rr> ),  fn  and  vn  are  all  Pz-a.s.  finite. 

Theorem  5.1.  The  /?!'  rx  satisfies  the  relation 

E7'  — M  Ft 

r  (6  )'! 

for  every  b  >1,  and  its  conditional  moment  is  given  by 

E*[tx  |  F  J  =  /  [3=0]  +  (5.6) 

1  -P 

The  expression  (5.5)  for  the  conditional  probability  generating  function  of  the  RV  tx  clearly 
shows  that  the  conditional  distribution  of  the  RV  tx  given  the  cr-field  Fx  is  independent  of 
the  non- idling  policy  ~  used,  a  property  that  is  immediately  transferred  to  the  (uncondi¬ 
tional)  distribution  of  the  RV  tv 


K 

n  zk(b) 


:-y^0]  + 


-I  [E=0 


Proof:  .\s  the  very  definition  of  tx  implies 


B*  — Mf.  -B’  —7 I [S/0]+  fi—  /== 0],  (5.7) 


a  direct  application  of  Theorem  4.1  (with  <r=l)  then  readily  yields  the  first  part  of  the 
result.  The  relation  (5.5)  can  be  rewritten  in  the  equivalent  form 


and  a  well-known  identity  for  geometric  series  thus  implies 
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£'[  £  (7777)  V.)  = 

0<k  <t1  r  (°  > 

Elementary  calculations  now  show  that 


/  [3=0]  + 


K 


n  zk(b  )B‘  -  1 

k  =0 


r(b) 


-1 


/[  3^0]. 


K  R 

n  zk  (b )  k  -  1 

k  =0 

r ( b  r1  -  1 


K 


exp [  Ek  In**  (6  )]-l 


k  =0 


b  -1 


X  r(b) 


'  1  -r(6) 
6  -1 


(5.9) 


(5.10) 


with 


lim 
4  !l 


i-r (6  ) 
b  -1 


-  -l-lim  jiiihOM  =  _(1_rt 
4  11  0-1 


and  lim  r  (6  )  =  1  by  virtue  of  (4. 5a), (4.9)  and  (4.12),  and 
4  U 


lim 
4  ll 


K 

exp[  £Etln**(6)]-l 


k  =0 


6  -1 


-Z{  S). 


(5.11) 


(5.12) 


Finally,  note  that 

lim  E*[  X)  (~ tt)*  i  Fil  =  £*tri  I  Fil  (5.13) 

Fi  0<i<f,  r(M 

by  the  Monotone  Convergence  Theorem  for  conditional  expectations,  whence  the  moment 
result  (5.6)  follows  upon  letting  6  go  to  1  monotonically  in  (5.10)  and  making  use  of  the  lim¬ 
its  (5. ll)-(5. 12).  □ 


Theorem  5.2.  Under  the  foregoing  assumptions  (Rl)-(R4),  the  relations 

1  ..  1 


Ez 


ip, ) 


X- 


r  (6  ) 


F, 


r  [a  ( z(b  ))-q  (0); 

r-q(  0) 


n=l,2,...(5.14) 


hold  true  for  all  r  and  b  in  the  interval  [l,  -  00).  In  particular,  the  RVs  3{r„  )  and  utjn  )  are 
conditionally  independent  of  the  a-field  F T< . 


Proof:  A  direct  application  of  Theorem  4.1  (with  u — rn  )  readily  implies  that 


E* 


I  [fn  <  oc . Tn  _!<oo)- 


r(b) 


Kv) 


=  1 1  r„  <oo]  n  zk  (b  ) 
k  =0 


Ah('„ )) 


(5.15) 


where  use  has  been  made  of  the  property  (5.4b).  The  (r- field  inclusion  F  TdFyr  and  an 
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elementary  smoothing  argument  using  (5.15)  now  imply  that 


E* 


I  <00,Tn  +1<00]- 


X- 


r(b ) 


Kr„) 


F, 


K 


I  [r„  <oo]- 


1  =0 


n  Zfi  (6 


F  T 


(5.16) 


The  relation  (5.14)  follows  immediately  from  (5.16)  upon  applying  Lemma  4.2  (with  a — rn  ) 
and  making  use  of  the  P"- a.s  finiteness  of  the  RV’s  rn  and  rn+1.  □ 


The  statistical  structure  of  the  busy  cycles  can  now  be  easily  obtained. 


Theorem  5,3.  Under  the  foregoing  assumptions  (Rl)-(Rf),  the  RV’s  {6n  }*>  form  a  ( possibly 
delayed)  renewal  sequence.  More  precisely,  the  RV’s  o.re  i.t.d  R 1  s  mutually  indepen¬ 

dent  of  the  a -field  Fr .  with  common  distribution  independent  of  the  non-idling  policy  r  and 

characterized  by 


E 


r(b)K~l 


a  U  (6  ))-<?  (0) 

r(6  )-q  (0) 


11=2,3, ...(5. 17) 


for  all  b  >1. 


Proof:  The  result  follows  immediately  from  (5.14)  (with  r  —r  ( b  ))  upon  observing  that  the 
RV’s  {6k  ,  l<k  <n  }  are  clearly  F u -measurable.  □ 


The  independence  of  the  common  distribution  of  the  RV’s  on  the  non-idling 

policy  7r  was  already  obtained  through  sample  path  arguments  in  ,1]  ,  where  the  discussion 
was  carried  out  in  the  context  of  a  different,  though  probabilistically  equivalent,  model  of 
competing  queues.  Here,  however,  sharper  statistical  information  on  the  length  of  busy  cycles 
has  been  derived:  Indeed,  with  aB(*)  denoting  the  probability  generating  function  of  the  R\ 
E  as  in  (4.5a),  the  reader  will  readily  conclude  from  (5.5)  and  (5.17)  that  the  relations 


E 


r  {b 


a3{z{b  ))4 


1-r  (6  ) 

r{b  ) 


q  s(o) 


(5.18a) 


and 


Ew 


r(b )*' 


a  {z  {b  )hq  (0) 

r  (6  )-q  (0) 


n=2,3,.. .(5.18b) 


hold  for  all  b  >1,  thus  providing  explicit  expressions  for  the  probability  generating  functions 
of  the  RV’s  {9n  }f°.  taken  in  the  variable  l/r  (b  )  (in  (0.1]  whenever  6  >1  by  Lemma  A. 1). 
Consequently,  information  on  the  moments  of  the  RVs  {0n  }1°°  is  now  readily  available  upon 
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computing  the  successive  right  derivatives  at  b  =1  of  both  sides  in  (5.18). 

The  interested  reader  will  easily  check  that  the  mappings  b  — >a  {z  (b  )),  b— >aE(z(b)) 
and  b  —*r(b  )  are  all  analytic  on  the  interval  (1,  +  oo);  moreover,  existence  and  computation 
of  the  k- th  derivative  for  each  of  the  functions  6  — ►l/r  (6  )”  ,n  =1,2,...,  require  existence  and 
computation  of  all  the  derivatives  of  order  l  with  0</ <k  of  the  function  b  —>a(z  [b  )). 
With  this  information  in  mind,  it  is  easy  to  extract  the  following  result  from  (5.18). 


Theorem  5.4.  Under  the  assumptions  (Rl )-(R{ ),  if  (Rf)  holds  with  integer,  then  the 
moment  result 


holds  true,  and  in  particular. 


E7 


<  oo 


E 7 


K 


<  oo. 


n=l,2,...(5.19) 


n  ==1,2, ...(5.20) 


In  other  words,  the  finiteness  of  the  moments  of  the  RV’s  {9n  }f°  is  exactly  of  the  same 
order  as  the  one  for  the  initial  condition  and  the  arrival  process. 


6.  MOMENT  ESTIMATES  AND  A  REPRESENTATION  OF  THE  COST: 

The  results  on  the  length  of  busy  cycles,  obtained  in  the  previous  section,  are  now  used 
to  derive  bounds  on  various  moments  of  the  sequences  of  RV’s  {  |  X (n )  |  jq00  and 
{  |  c  (X {n  ))  |  Jj00  under  any  non-idling  policy  ir  in  n.  The  key  fact  is  contained  in  the  obser¬ 
vation  that  the  total  number  of  customers  in  the  system  at  any  given  time  n  decreases  by  at 
most  one  unit  in  the  next  time  slot  \n  ,n  +1),  and  is  therefore  bounded  above  by  the  number 
of  slots  it  takes  for  the  queue  sizes  to  empty  for  the  first  time  after  n  .  To  formalize  this  idea, 
consider  the  (continuous-time)  counting  process  {N(t),  f  >0}  naturally  associated  with 
either  sequence  {r„  jo03  or  {f?„  }0°°  (under  the  convention  r0=90= 0).  i.e.,  for  all  t  >0, 

N  {t )  :=  ma x{k  >0:  rk  <  t  }  (6.1) 

with  the  ready  interpretation  that  N(t)  represents  the  number  of  times  the  queue  has 
returned  to  the  empty  state  by  time  t  .  With  this  notation,  the  observation  made  earlier 
now  translates  into 

I  -V(«)  |  <  rV(B)+1  -  n.  n=l,2„..(6.2) 

By  making  use  of  this  fact,  it  will  be  possible  to  show  the  following  strong  estimates. 

Theorem  6.1.  Under  the  foregoing  assumptions  (Rl)-(R.{),  there  exists  a  single  positive  con¬ 
stant  C  such  that  for  every  non-idling  policy  n  in  fl,  the  moment  estimate 
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sup^lX^)!7-1]  <  c  <  oo  (6.3) 

holds  true. 

Theorem  6.1,  whose  proof  is  presented  below,  turns  out  to  be  a  special  case  of  an  inter¬ 
mediate  result  of  independent  interest  which  is  discussed  in  Theorem  6.2.  The  reader  will 
readily  observe  from  Theorem  6.1  that  whenever  /i>2,  the  RV’s  {X (ji  )}1'x  form  a  uniformly 
iritegrable  sequence  under  P and  this  uniformly  in  the  policy  7r.  Another  simple  conse¬ 
quence  of  Theorem  6.1  is  presented  in  the  following  corollary. 

Corollary  6.1.1.  Under  the  assumptions  ( Rl)-(R5 ),  whenever  7  is  such  that 

l-v<5(i+e)  <  7  (6.4) 

for  so7ne  e>0.  the  R\':s  {c  {X (n  ))}1CC  are  uniformly  integrable  under  the  probability  measure 
Pz  associated  with  any  non-idling  policy  7r. 

As  will  be  clear  from  the  proof  given  below,  this  uniform  integr ability  is  also  uniform 
over  the  class  of  non-idling  policies. 

Proof:  Assumption  (R5)  and  (6.4)  immediately  imply  that 

I  c  (Ar(n  ))  |  <  L  1+{2e  1+  |  X (n  )  |  n=0,l,...(6.5) 

and  the  result  follows  by  a  direct  application  of  Theorem  6.1.  pj 

As  shown  in  Section  5,  the  process  { 9n  Jj00  is  a  delayed  renewal  process  under  any  non¬ 
idling  policy  7T  in  IT,  with  statistics  independent  of  the  policy  n.  For  reference,  denote  by  G  (•) 
the  distribution  of  the  RV  01(=r1)  and  by  F  {-)  the  common  distribution  of  the  i.i.d  RV’s 
{6n  } 2°°.  In  general,  the  distributions  G  (•)  and  F  (•)  do  not  coincide.  Now,  for  any  monotone 
7ion-decreasing  mapping  r  :  IR  +  — >IR  ~  ,  define  the  IR  -valued  process  {R  { t  ),  t  >0}  by 

R(t)  —  r{rN(t)+1  -  t  )  (6.6) 

for  all  t  >0,  with  corresponding  expected  value 

MG(t)  -  f )]  (=  E*\R  (Oj)-  (6.7) 

The  subscripts  G  and  E  in  (6.7)  emphasizes  the  fact  that  the  system  is  started  with  an  ini¬ 
tial  queue  size  E  distributed  according  to  the  distribution  <?B(*)-  If  G  {-)—F  (•),  the  sequence 
{ 9n  }1CC  is  a  non-delayed  renewal  sequence  and  it  is  appropriate  to  pose 

MF(t )  :=  E<f  r{rN{l)+1  -  t ) 

This  corresponds  to  an  appropriate  choice  of  the  initial  condition  E. 


(6.8) 
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The  first  part  of  this  section  is  devoted  to  the  derivation  of  a  bound  on  the  expected 
values  {Mg(t ),  t  >0}  for  any  non-idling  policy  n,  with  a  view  towards  generating  (via  (6.2)) 
a  bound  for  the  sequence  of  expected  values  {E*\r  (  |  X(n  )  |  ) ] } x00 . 


Theorem  6.2.  Let  n  be  an  arbitrary  non-idling  policy  in  n.  Under  the  finite  moment  assump¬ 
tions 

X  00 

J r  ( 6)dG  (9)  <  oc  ,  Jr  ( 9)dF  ( 9 )  <  oo,  (6.9) 

o  o 

the  condition 

cc  6  coco 

J  J  r  (9-t )  dt  dF  (9)  =  J  Jr  (9-t  )  dF  {6)  dt  <  oc  (6.10) 

oo  oi 

implies 


sup  Ma(t)—  sup  E*[R  (t ))  <  oo. 

t> 0  t >0 


(6.11) 


Proof:  Define  the  mappings  aG  (•),  aF  (•)  :  E  T  -+IR  +  by 

00  00 

aG(t):=  Jr  ( 9-t )  dG  (9),  aF  (t)  :=  Jr  (9-t )  dF  ( 9 )  (6.12) 

t  t 

for  all  t  >0.  Since  r  (•)  takes  positive  values  and  is  monotone  non-decreasing,  the  indefinite 
integrals  entering  the  definition  (6.12)  are  well  defined  and  satisfy  the  obvious  inequalities 

CO  OO  00 

0<  J r  (0-t )  dG(9)  <  Jr  ( 9-t )  dG  {9)  <  Jr  ( 9-8  )  dG  (9)  (6.13) 

t  s  s 

whenever  0<s  <  t  ,  with  a  similar  chain  of  inequalities  when  G  (•)  replaced  by  F  (•).  It  is  now 
clear  from  (6.9)  and  (6.13)  that  0<  aG  [t  )<  aG  (0)<oc  for  all  t  >0,  whence  the  mapping 
aG  (•)  is  well  defined  and  monotone  non-increasing.  Similar  comments  are  of  course  valid  for 
aF  (•)• 

A  standard  renewal  argument  (  dO;  ,  pp.  183)  applied  to  the  process  {R(t),  t  >0} 
shows  that  for  all  t  >0, 

t  oc 

UG  {t )  -  jMf  if  -9)  dG  (9)  +  Jr  ( 9-t )  dG  ( 9 )  (6.14) 

o  i 

and  the  remarks  made  earlier  thus  imply  that 

t  X 

Mg  (f )  <  J MF 1 1  -6)  dG  (9)  +  Jr(9)dG(9) 

o  o 


(6.15) 
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oo 

<  sup  Mp(s)  +  fr(0)dG(0).  (6.16) 

0<S<f  Q 

This  clearly  shows  that  under  the  assumed  condition  (6.9),  the  result  (6.11)  will  hold  if  it  can 
be  established  that 

sup  Mp  (t )  <  oo.  (6  17) 

i  >o  \  ■  J 

With  the  notation  (6.12),  the  renewal  equation  (6.14),  this  time  with  G  («)=F  (•),  specializes 
to 

t 

Mp(t )  —  aF(t)  +  jMF(t  -0)  dF(6)  (6.18) 

o 

for  all  t  >0.  Since,  as  pointed  out  earlier,  the  mapping  aF(-)  is  monotone  non-increasing  and 
takes  non-negative  values,  it  is  consequently  integrable  owing  to  (6.10),  and  therefore  directly 
Riemann  integrable  (  [10]  ,  pp.  190-191).  The  distribution  F  (•)  has  support  on  IN  and  is  thus 
arithmetic,  say  with  span  d  .  These  remarks  validate  an  application  of  the  Basic  Renewal 
Theorem  (  [10]  ,  Thm.  5.5.1.,  p.  191)  on  the  renewal  equation  (6.18),  in  that  the  convergence 

d  00 

lim  Mp(c  —nd  )  =  —  Y  aF  (c  +nd  )  (6.19) 

0  n  =o 

takes  place.  Here,  0  denotes  the  mean  of  the  distribution  F  {•)  which  is  finite  as  noted  at  the 
end  of  Section  5. 

The  mapping  Mptfi)  is  non-increasing  on  each  one  of  the  intervals  [n  ,n  +1),  whereas  the 
mapping  Op  (•)  is  non-increasing.  These  properties  and  (6.19)  readily  imply  that 

_  J  <X  J  CO 

lim  Mp  (t)  <  —  Yi  aF  (nd )  <  —  Y  aF  (n  ) 

(  T°°  0  r.  =0  $  r.  =0 


occc 

/  /  r  (0-t )  dF  ( 9 )  dt 

0  t 


<  oo 


(6.20) 


where  the  finiteness  of  the  bound  in  (6.20)  follows  from  the  assumptions  (6.9)-(6.10).  Since 
(6.9)-(6. 10)  imply  that  for  each  t,  Mp{t )  is  finite,  the  bound  (6.17)  obtains  and  this  completes 
the  proof  as  pointed  out  earlier.  □ 


Proof  of  Theorem  6.1:  Start  with  the  mapping  r  (•)  given  by  r(t)  =  i’1  1  for  all  x  >0, 
and  note  from  elementary  calculations  that 

CO  0  X 

/  /  r  (6-t )  dt  dF  (0)=±f  O'1  dF  ( 9 ).  (6.21) 

oo  Tf  0 

It  follows  from  Theorem  5.4  and  (6.21)  that  the  conditions  (6.9)-(6.10)  hold  true,  and  a 


straightforward  application  of  Theorem  6.2  thus  implies  (6.3). 


□ 


It  is  noteworthy  that  if  the  much  stronger  hypothesis  (R4bis)  is  substituted  to  (R4),  where 
(R4bis):  For  some  constants  X>0  and  0 <D  <oo,  the  bounds 
£*[exlBl]  <  D  ,  £'*[exl-'l('!)l]  <  D 

hold  true. 

then  the  results  of  Section  5  and  Theorem  6.2  (with  r(x)  —  e  Xj  )  imply  the  bound 
7re  X  [  ARn )  |  <  d  i  <  oc  for  all  n=1.2....  This  can  be  also  proved  directly  by  an  easy 
application  of  the  results  of  Hajek  [8]  without  referring  to  the  results  of  Section  5. 

This  section  closes  with  a  useful  representation  result  for  the  cost  under  any  non-idling 
Markov  stationary  policy.  For  any  such  a  policy  g  ,  the  sequence  {A'(n  )}1cc  is  a  homogeneous 
Markov  chain  over  the  state-space  E\tA  under  the  probability  measure  P 9  induced  by  the 
policy  g  .  All  states  communicate  with  each  other  under  the  assumed  stability  condition  p<  1 
since  0  <  pk  <  l,  whence  the  chain  is  irreducible.  From  this  property,  it  is  clear  that  the 
chain  is  also  aperiodic  for  the  empty  state  is  aperiodic.  Moreover,  the  condition  7  >  1  in  the 
assumption  (R4)  gives  the  finite  mean  property  (5.20)  which  readily  implies  that  the  empty 
state  is  positive  recurrent  and  so  are  all  the  states  by  virtue  of  the  irreducibility  of  the  chain. 

It  now  follows  from  standard  results  on  Markov  chains  (  [10]  .  Thm.  3.1.3.,  pp.  85)  that 
the  Markov  chain  {A~(n  )}lXi  admits  under  P9  a  unique  invariant  measure,  which  is  denoted 
throughout  by  IP?  with  corresponding  expectation  operator  IE1'  . 

Theorem  6.3.  Under  the  foregoing  assumptions  (Rl)-(R4),  if  the  RV’s  {c  (X(tt  ))}100  are 
uniformly  integrable  under  the  probability  measure  P 9  associated  ■ with  the  non-idling  Markov 
stationary  policy  g  ,  then  the  folloiving  convergence  results  hold: 

(i):  With  X  denoting  a  generic  En  ~1-valued  RV. 

lim  E 9  c  {X {n  ))  =  Ey  c  {X)  (6.22a) 

Tl  — »00 

independently  of  the  initial  state  distribution,  and 
(n): 

J{g)=  dm  —  E  c  (X (i ))  (6.22b) 

n  —  XI  n  ,•  _  J 

where  the  convergence  takes  place  P9  -a.s.  and  in  L  l{O..F  ,P  9  ). 

It  should  be  clear  to  the  reader  that  (6.22a)-(6.22b)  implies 

Jig  )  =  lim  -E9  £  c  (X(>  ))  =  c  (A')  (6.23) 

n  —co  U  1  =1 

whenever  the  RV’s  {c  (X 1  n  ))}1co  are  uniformly  integrable  under  the  probability  measure 
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Proof:  (i):  For  each  B  >0,  introduce  the  so-called  truncation  of  c  at  level  B  as  the  bounded 
mapping  cs  :INA  rl— ►IR  defined  by  cB(x)  :=  (c  (x  )\/  -B  )  A  B  for  all  x  in  IN^+1.  With  this 
notation,  the  elementary  inequality 

I  E'[c  (X ( n  ))]-E?  [c  (X )j  |  <  |  E>[c  (X(n  ))}-Ea  \cB  (X (n  ))]  | 


+  \E°[cB  (X (n ))]— IEff  (cs(X)J 
+  I  E5  [cB(X)}-JE9  [c  (A')]  | 


n  =  1.2. ...(6. 2-4) 


is  clearly  valid  valid  for  every  B  >0.  The  argument  for  establishing  (6.22a)  thus  reduces  to 
showing  that  each  one  of  the  difference  terms  can  be  made  arbitrarily  small  as  n  goes  to 
infinity: 

Since  the  chain  is  ergodic  under  Pg  ,  it  is  well  known  (  [10]  ,  Thm.  3.1.3,  pp.  85),  that 
lim  Pg  [.Y ( n  )=x  ]  =  ].Y  —x  ]  for  all  x  in  INA  "rl,  and  the  convergence 


lim  E9  b  {X (n  ))  —  E?  b  (A'") 


(6.25) 


thus  takes  place  for  any  bounded  mapping  b  :=  EntA  1-—  ]R.  The  assumed  uniform  integra- 
bility  now  implies  that  for  all  B  >0, 


sup  Eg  ]|  cB  {X(n  ))  |  ]  <  sup  Eg  [  |  c  {X (n ))  |  ]  <  00 


(6.26) 


whence 

IEJ  |  cB  {X)  |  =  lim  E9  [  |  cB  (X (n))  |  ]  <  sup  E g  [  |  c  (JY(n  ))  |  ]  <  00  27) 

n  Too  n  v  ' 

with  the  equality  following  from  (6.25).  The  Monotone  Convergence  Theorem,  used  on  the 
positive  and  negative  parts  of  cB(X),  readily  yields  the  conclusion  that 

lim  IE-"  [  cB  (X)  |  =  IES  |  c  (A')  |  <  sup  E9  [  |  c  (X {n  ))  |  ]  <  00,  /g  9sa) 

B  Tco  n  \  ) 


lim  IE?  cs(AA)  =  E?  c(.Y), 

B  Tco 


(6.28b) 


Consequently,  lim  |E?c(X)  -  E?c5(A')|  =0.  i.e.,  for  every  e>0.  there  exists  5f> 0 
B  -*oo 

such  that  for  B  >B  £, 


Es  c  {X)  -  E?  cB{ X)  <  e. 


(6.29) 


The  convergence  (6.2-5)  can  be  expressed  by  saying  that  for  every  B  >0  and  every  e>0, 
there  exists  n  (e,B  )  in  IN  such  that  whenever  n  >  n  ( e.B  ) 


\E9[cB(X(n))]-JB9[cB(X)}  |  <  € 


(6.30) 


Finally,  the  very  definition  of  cB  implies 
E’[c{X(n)))-E>[cB(X{n))]  \<E9  [/ [  |  c  (X(n  ))  |  >B  ]  |  c  (X(n  )) 


n=l,2,...(6.3l) 


and  by  the  uniform  integrability  of  the  RV’s  {c  {X  (n  ))}100,  for  every  e>0,  there  exists 
Bi> 0  such  that 


sup  I  E 9  [c  (X (n  ))]-£■ 9  [cB  ( X (n  ));  |  < 
I[  \  c  (. .V (n  ))  |  >5  j  |  c  (A' («  )) 


<  sup 

n 


<  € 


(6.32) 


whenever  B  >Bi. 

The  reader  will  readily  check  from  (6. 29), (6. 30)  and  (6.32)  that 

|  E9[c  (X(n  ))]-lB1’  [c  (X )]  |  <  3c  (6.33) 

whenever  n  >n  {e,B e y  5e)  and  this  complete  the  proof  of  (6.22a)  since  e  is  arbitrary. 

(ii):  For  ease  of  exposition,  define  the  IR-valued  RV's  {}rc  (n  )}1x  by 

YC  (»):=—  s  c  yy  o  ))■  n=l,2,...(6.34) 

n  i  =i 

The  uniform  integrability  of  the  RV’s  {c  (X (n  ))}1<X)  carries  over  to  the  R\"s  {Yc  (n  )}1co  and 
consequently,  it  suffices  to  establish  in  (6.22b)  the  P9- a.s.  convergence  of  the  sequence  of 
RV’s  {yc  (n  )}“  (  [6]  ,  Thm.  4.5.4,  pp.  97-98).  As  discussed  by  Chung  (  (5]  ,  Thm.  1.15.2,  pp. 
92),  this  convergence  takes  place  with 

lim  Yc{n  )  =  E-"  c  (A'),  (6.35) 

n  -»co 


provided  the  condition  |  IEJ  c  (X )  j  <  oo  holds.  This  absolute  summability  condition  was 
obtained  earlier  in  the  proof  as  (6.28a)  and  this  completes  the  proof  of  part  (ii).  □ 


7.  AN  EXTENSION  OF  MANDL’S  RESULT: 

Throughout  this  section,  let  g  denote  a  fixed  non-idling  Markov  stationary  policy  in  n 
and  let  a  be  a  second  policy  in  n  (which  is  not  necessarily  non-idling).  The  results  will  be 
given  in  as  generic  a  form  as  possible  to  emphasize  the  broad  applicability  of  the  methodol¬ 
ogy. 

Theorem  7.1.  Under  the  foregoing  assumptions  (Rl)-(RS),  assume  the  assumptions  (Hl)- 
(H5)  to  be  enforced.  Whenever  the  policy  a  satisfies  the  convergence  condition  (C)  uoth 
respect  to  the  non-idling  policy  g  ,  the  convergence 
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J (a)  =  lim  —  c  (X(i))  =  J  (g  )  =  IE9  c  (. X )  (7.1) 

n  — oo  Tl  ,• 

takes  place  in  Ll{Cl,F  ,Pa). 

In  order  to  prove  Theorem  7.1,  it  is  necessary  to  extend  an  argument  given  by  Mandl 
[13]  to  the  case  of  unbounded  costs  over  countable  state-spaces  and  randomized  strategies. 
The  forthcoming  discussion  leads  via  several  lemmas  of  independent  interest  to  a  proof  of  the 
main  Theorem  7.1.  The  first  lemma  provides  a  well-known  characterization  of  the  long-run 
average  cost  J{g),  which  is  valid  for  an  arbitrary  (not  necessarily  non-idling)  Markov. sta¬ 
tionary  policy. 

■  Lemma  7.1.  If  the  mapping  h  :  EXtA"+1  — *  IR  and  the  constant  J  solve  the  equations 


h  ( x  )+j  =  s  p(x  >y-9  (x ))/;  (y  )+c  (x  ) 

y 

(7.2) 

for  all  x  in  Ex'^+1 

under  the  conditions 

Eg  |  h  (X  ( n  ))  |  <  oc 

n  =  l,2,...(7.3a) 

and 

lim  —Es  [h  ( X(n  ))]  =  0, 

n  —co  72 

(7.3b) 

then  necessarily 

J  —  J [g  )  =  lim  —  E9  J]  c  (X ( i )). 

n  —oo  n  ,■  =1 

(7.4) 

Proof:  The  formula  (2.3)  for  the  transition  probabilities  allows  a  rewriting  of  (7.2)  in  the 
form 

/:  ((.V (/))+/  Eg  [h  (X(i+l))  |  F,-]+c  (A” (/')).  i=l.2,...(7.5) 

and  a  direct  iteration  then  gives 

E9[h(E)}+nJ  =  E»  [h(X(n+l))}-E9  {£}  c(X(i))[.  n=l,2,...(7.6) 

i  =1 

The  result  now  follows  readily  upon  dividing  by  n  in  (7.6)  and  letting  n  go  to  infinity.  q 

When  the  cost  function  c  is  bounded,  at  least  one  solution  pair  (/:  .J)  can  be  shown  to 
exist  which  satisfies  the  conditions  of  Lemma  7.1.  Tliis  existence  result  is  established  by  a 
standard  argument  available  in  the  monographs  by  Ross  [18.  19;  :  Let  the  expected 
discounted  cost  function  associated  with  the  one-step  cost  c  be  denoted  by  C  p  whenever  the 
non-idling  policy  g  is  used  over  the  infinite  horizon  and  the  discount  factor  is  /?<  1.  i.  e.,  for 
all  x  in  E\'a  pose 
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oo 

C p{x)  :=  E9  [Y]  P' c{X(i))  I  X (l)—x  ].  (7.7) 

i  =1 

In  that  case,  it  is  plain  that  for  all  x  in  E\7A:+1,  the  relation 

C  fk*)  =  c(x  )+/?£)  p(x,y,g(x))C^y)  (7.8) 

y 

holds,  or  equivalently, 

(1-7?) C  dP)+h£x  )  =  c  (x  )+/3^  p  (x  .y  -g  (x  ))h  ).  (7  9) 

y 

where  the  definition 

h  p(x  ) ’■—  C p(x  )-C  0)  (7.10) 

has  been  posed  for  all  x  in  INA +1. 

This  last  remark  is  useful  for  establishing  an  intermediary  result  given  in  the  next  pro¬ 
position.  The  discussion  is  presented  under  condition  (Hlbis),  where 

(Hi bis):  The  uniform  bound 

sup  E9  [Z  (X  (n  ))]  <  oc 

n 

holds.  It  should  be  clear  to  the  reader  that  (Hlbis)  is  a  weaker  condition  than  (HI). 


Lemma  7.2.  Assume  the  mapping  c  to  be  bounded  and  the  non-idling  policy  g  to  satisfy  the 
condition  (Hlbis).  Under  the  assumptions  (Rl)-(RS),  with  the  notation  and  definitions  given 
above, 

(1) :  There  exists  some  constant  C  >0  such  that 

I  M*  )  |  <  CZ(X  )  (7-11) 

for  all  x  in  IN^  ~1  and  all  3  in  (0,1), 

(2) :  The  convergence 

lim  (1-/3)  Cj(o)  =  J  (g  )  —  IE9  c  ( X )  (7  10) 

takes  place,  and 

(3) :  There  exists  a  pair  (h  ,J)  which  satisfies  ( 7.2)-(7.8 ),  given  by  J  —J (g  )  and 

h(x)  :=\im  h^(x).  (7.13a) 

the  limit  being  taken  along  a  single  subsequence,  with  the  property 


for  all  x  in  lKk  +1. 


h(x)\  <  CZ  (x  ) 


(7.13b) 
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Proof:  (1):  Fix  17^0  in  IN*+1.  With  the  notation  (5.1),  standard  arguments  yield 


C p{x  )-C 0) — E 

It  is  clear  that 


'  I  EPc{X{i))\X.(l)=x 

i  =1 


-C0)E‘- 


l-pTl\X(l)=x 


(7.14) 


q- 1 

E3 

E  P  c  (x  (» ))  1  U)=* 

I  =t 

<  |  c  |  Es  [  rx  |  A'(l)=a;  j  (7.15) 


whereas  the  easy  bound 


immediately  implies 


1  <-L^ 


(7.16) 


OWE' 


1-/3 Tl  |  X(l)=x 


<  |  c  1  Ei 


1  -  3  r‘ 
I  -  3 


\X{1)  =  2 


<  \c  I  E9  [  -Tl  I  X (l)=2  ]  (7.17) 

by  standard  properties  of  geometric  series.  Use  of  (7.15)  and  (7.17)  on  (7.14)  readily  leads  to 

I  M2-)  I  <  2  |  c  \  FX  [t1\  X(l)—x  <  —  1  CJ~Z (1 ).  (7.18) 

1  -P 

O  [  Q  I 

where  the  last  inequality  follows  from  (5.6),  and  (7.10)  obtains  with  C  :=-rlJ - 

1  -P 

(2) :  Since  c  is  bounded,  Theorem  6.3  applies  and  the  result  (7.12)  readily  follows  from 
(6.23),  the  definition  of  J (g  )  and  the  version  of  the  Tauberian  Theorem  stated  in  Prop.  4-7 
of  (  [9]  ,  pp.  173). 

(3) :  Owing  to  (7.11),  the  mapping  (3—*h  q{x  )  is  bounded  for  all  x  in  IN*+1.  A  simple 
diagonalization  argument  then  implies  the  existence  of  a  subsequence  {3n  }f°  in  [0,1],  with 
13  n  |l  as  n  too,  along  which  the  sequence  {h  p  ( x  )}1co  has  a  well-defined  limit  h  ( x  )  for  all  x 
in  IN*'-1.  The  mapping  h  clearly  satisfies  (7.13b)  and  therefore  enjoys  the  properties  (7.3) 
under  the  uniform  bound  (Hlbis).  A  simple  bounding  argument,  that  uses  the  form  of  the 
dynamics  (2.2)  and  the  enforced  assumptions  (Rl)-(R3),  easily  implies  that 


0  <  S  p{x  ,y;g(x))Z{y)  <  Z{x)  +  p  <  oc  (7  19) 

y 

for  all  x  in  IN*  +1.  Dominated  convergence,  coupled  to  (7.11),  (7.13b)  and  (7.19),  now  gives 

lim  3n  YjPix  ,y  -,g{x  ))hfi  (y)  =  £,  P  (j  ,y  ;<?  (x  ))h  ( y  )  (7  o0) 

"~00  y  y 

for  all  x  in  IN*+1.  Upon  taking  the  limit  in  (7.9),  these  remarks  and  (7.20)  readily  imply  that 
the  pair  (li  ,J (g  ))  indeed  solves  (7.2)-(7.3).  □ 
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Throughout  this  section,  let  h  denote  the  mapping  IN/<r+1— >1R  whose  existence  was  esta¬ 
blished  in  Lemma  7.2  when  c  is  a  bounded  mapping.  The  sequences  {4>(n  and 
{Y (n  )}1co  of  IR-valued  RV’s  are  defined  to  be 

d>(n  )  =  Ea[h  (X(n  +1))  \Fn]-E9  [h  (X(n  +1))  |  Fn  j  n=l,2,...(7.21) 

and 

Y {n  )  =  h  (X (n  +1))  -  Ea{h  (X (n +1))  \  Fn]  n— 1,2, ...(7.22) 

with  Y (l)—h  (X  (l))  -  E^'li  (X (1))1.  Note  that  these  definitions  are  well  posed  under  the 
assumptions  (Hlbis)-(H2)  owing  to  the  properties  enjoyed  by  the  mapping  h  . 

Lemma  7.3.  Assume  the  mapping  c  to  be  bounded  and  the  assumptions  (Hl)-(H2)  to  be 
enforced.  'Under  the  foregoing  assumptions  (Rl)-(RS),  whenever  the  policy  a  satisfies  the  con¬ 
vergence  condition  (C)  with  respect  to  the  non-idling  policy  g  ,  the  convergence 

'  i  - 

lim  —  E4>(O  =  0  (7.23) 

n  —oo  n(=1  v  7 

takes  place  in  L  1(Q,F  ,Pa)- 

Proof:  Let  / *  be  the  Markov  stationary  policy  in  It  which  always  gives  service  attention  to 
the  k-th  queue,  i.e.,  for  all  x  in  IN^"rl, 

fk(l,x)  =  8(k,l),0<l,k<K.  (7.24) 

With  this  notation,  it  is  plain  from  (2.3)-(2.5)  that 

K  t  r  1 

$(n)=  E  [Q„  (k  ,H{n  ))  -  g  (k  ,X  {n  ))}E  *  h  {X {n  - FI))  |  Fn  (7.25) 

*  =o  L 

for  all  n  =1,2,...  For  each  0<k  </v  ,  observe  from  (7.13b)  and  the  form  of  the  one-step  tran¬ 
sition  probabilities  (2.3)-(2.5)  that 

|  Ef‘  \h  (X(n  +1))  |  Fn  1  |  <  CEft  \z(X(n  + 1))  |  Fn 

<  CEft  [z(X(n))  4-  Z(A(n))  \  Fn 

<  c[z(X(n)  +  p)  n=l,2,...(7.26) 

If  the  IR-valued  RV's  {A (n  J},00  are  defined  by 

K 

A(n  )  :=  E  I  ai  (k  ,H  (n  ))  -  g  {k  ,X  {n  ))  |-,  n=l,2,...(7.27) 

k  =0 

it  is  now  immediate  from  (7.26)  that 
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|  <b(n  )  |  <  Z(X(n  )  +  p  j  A(n  ) 

and  consequently, 


—  E  I  «*(»)  |  <  —  S^(X(i))A (0 

ni=l  «i=l 


n 


n 


EM’). 


n=l,2,...(7.28) 


(7.29) 


Under  the  condition  (C),  the  RV’s  {A(n)}i°°  converge  to  0  in  probability  under  Pa.  i. 
e.,  for  every  e>0  and  6>0.  there  exists  an  integer  n{c,8)  in  IN  with  the  property  that  when¬ 
ever  n  >  n  (e,8)  in  IN, 

Pa[A(n  )  >  e]  <  8.  (7.30) 

Cesaro  convergence  of  the  RV’s  {A(n)}1co  to  0  also  takes  place  in  probability  under  Pa,  and 
the  RV’s  {A(n  )}100  being  uniformly  bounded  by  2 K  ,  a  well-known  result  on  the  convergence 
of  uniformly  integrable  RV’s  (  [6)  ,  Thm.  4.5.4,  pp.  97-98)  gives  the  convergence 

On  ri 

lim — -EA(O  =  0.  (7.31) 

n  too  n  ,•  =1 

both  in  probability  under  Pa  and  in  Ll(n,F  ,Pa). 

Moreover,  the  uniform  integrabilitv  assumption  (H2)  is  equivalent  to  the  uniform  bound 

:=  sup  Ea[Z(X(n  ))[  <  oo  (7  32) 

n  v 

and  to  the  fact  that  for  every  ti>0  there  exists  some  <5(r?)>0  such  that 

sup  Ea[Z(X(n  ))/ (A  ))  <  7]  (7.33) 


for  any  event  A  in  F  with  Pa(A  )  <  8(r]). 

Now  fix  e>0  and  i]>0.  Upon  combining  (7.29)  and  (7.32),  the  reader  will  now  readily 
check  that  for  all  n  >n  (e.<5(f/))  in  IN,  Pa{A(n  )>ej <<§(//)  and 


Ea 


Z  (A”  (n  ))/  fA(?r  )>e’ 


<  V' 


(7.34) 


whence 


Ea 


Z  {X  (n  ))A(n  ) 


<  eEa 


Z  (X (n  )) 


+  t]  <  eZ c 


V- 


(7.3c 


It  is  now  straightforward  to  see  that  for  n  >n  in  IN, 


Ea 


±ZZ(X(t))A(,) 
n  . 


Ea 

,  n  (c.f (r;)) 

-  Y,  Z[X(i))A(i) 

+  Ea 

—  S  £(A'(i'))a(0 

n  .-=1 

n  n  (e.f(r;))<i  <n 
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<  2 ICZ 


n  M(<?» 


+ 


+ 


”) 


n  (eAv)) 

n 


(7.36) 


Let  n  go  to  infinity  in  (7.36)  and  observe  that 


whence 


lim£“ 

n  too 


”.■=  i 


<  eZ , 


lim  Ea 

n  joo 


1_£z(X(0)a(0 

n  i=i 


=  o 


(7.37) 


(7.3S) 


since  both  e  and  p  are  arbitrary.  The  proof  of  Lemma  7.3.  is  now  completed  upon  using 
(7.31)  and  (7.38)  on  (7.29).  ru 


Lemma  7.4.  Assume  the  mapping  c  to  be  bounded  and  the  assumptions  (H2)  and  (H3)  to  be 
enforced.  Under  the  foregoing  assumptions  (Rl)-(RS),  the  convergence 


lim  —  £)  Y(i)  =  0 

n  — '  Co  ?!  ■  =1 


(7.39) 


takes  place  P  a-a.s.  and  in  L  1{Ct,F  ,P  a). 


Proof:  The  sequence  {V’(n)}i°°  forms  a  ( Pa ,Fn)-martingale  difference  sequence.  Under 
(H3),  the  estimate 


Ea 


Y\n) 


n=  1  n‘ 


(  00 

<  4 C2  Ea\  Y]  - 

\Z(X(n))  |2] 

o 

n  1 

<  oc 


(7.40) 


readily  follotvs  from  Jensen’s  inequality  and  (7.13b).  A  martingale  version  of  the  Law  of 
Large  Numbers  (  [13]  ,  Theorem  3),  the  so-called  Stability  Theorem,  thus  applies  to  give  the 
convergence  (7.39)  in  the  Pa-a.s.  sense.  Assumption  (H2),  when  coupled  to  the  estimate 
(7.13b),  immediately  implies  the  uniform  integrability  of  the  RY’s  {h{A{n))}ff  under  the 
probability  measure  Pa,  whence  the  convergence  (7.39)  also  takes  place  in  L^n.F.P'1) 
owing  to  standard  results  on  the  convergence  of  uniformly  integrable  RY's.  q 


Theorem  7.1.  is  now  shown  to  hold  under  the  more  restrictive  assumption  that  the 
mapping  c  is  bounded. 

Theorem  7.2.  Assume  the  mapping  c  to  be  bounded  and  the  assumptions  (Hl)-(HS)  to  be 
enforced.  Under  the  foregoing  assumptions  (Rl)-(RS),  whenever  the  policy  a  satisfies  the  con¬ 
dition  (C)  with  respect  to  the  non-idling  policy  g  ,  the  convergence 
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/(a)  =  lim  —  B  c  (X(i ))  =  J  (g  )  =  Es  c  (X)  (7.41) 

n  — *oo  71  j 

takes  place  in  L  1{C1,F  ,Pa)- 

Proof:  The  discussion  follows  closely  the  one  given  by  Mandl  (  [13]  ,  p.  46):  In  order  to  com¬ 
pare  the  relative  effects  of  the  policies  g  and  a  on  the  cost,  (7.5)  is  rewritten  in  the 
equivalent  form 

'h(X{i))+J(g)=  -m)-Y(t)-'rh(X(i+l))+c(X(t))  i— 1,2. ..(7. 42) 

by  adding  and  substracting  both  RV's  Ea[h  (X (i  + 1))  |  F;  ]  and  h  (X (i  ~  1))  on  the  right 
handside  of  (7.4).  Iteration  of  (7.42)  readily  implies 

J(9)  +  +  —  E  y(‘) 

.  «•  = i  n,=i 

=  —  Ec(-^(0)  -  — (  h  (X  (n  — 1))  -  h(X(  1))  )  n— 1,2. ...(7.43) 

and  note  that 

lim  —  E a\  j  h  (X ( n  —1))  |  ]  =  lim  —  Ea{  \  h  (X  (1))  |  ]  —  0,  (7.44) 

n  -*co  71  n  -*oo  77 

owing  to  either  (H2)  or  (H3)  when  coupled  to  the  estimate  (7.13b).  The  result  (7.41)  is  now 
easily  obtained  upon  taking  the  limit  in  (7.43),  with  the  help  of  (7.44),  and  applying  Lemmas 
7. 2-7. 4.  □ 

Proof  of  Theorem  7.1:  For  each  B  >0,  let  cB  be  the  mapping  ►IR  used  in  the 

proof  of  Theorem  6.3.  Under  the  assumed  conditions,  Theorem  7.2  implies  the  convergence 

lim  -  S  cs(X(i))=E?  cB(X)  (7.45) 

n  — »oo  71  ■  =1 

in  L  \n,F  ,Pa ),  i.e.,  for  every  e>0  and  every  B  >0,  there  exists  n  (e.B  )  in  E\r  with  the  pro¬ 
perty  that  for  all  7i  >n  ( e.B  ), 

Ea  -X) cB(X(i ))  -  ]FJ  cb  (X )  <e.  (7.46) 

ni=  i 

On  the  other  hand,  under  assumption  (H4),  as  pointed  out  in  the  proof  of  Theorem  6.3, 

[  c  (A' )  |  <  co  and  for  every  e>0,  there  exists  B  t>  0  such  that  for  B  >B  £, 

E?  c  ( A' )  -  Ey  cB  (X )  <  e. 

It  is  also  clear  from  the  definition  of  the  mapping  cB  that 


(7.47) 
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^4S  c(X(i))  -  cB  {X  ( i )) 


l  =1 


<-£“E  [|c(X(t))|/[  |c(X(0)|  >5; 
n  L 


t  =1 


(7.48) 


whereas  the  uniform  integrability  condition  (H5)  guarantees  that 

lim  Ea  |  c  (X(n  ))  |  I  [  |  c  (X(n  ))  |  >  B  ]  :=  0  uniformly  in  B  .  This  last  fact  carrying  over 
n  Too 

to  Cesaro  convergence,  it  follows  that  for  every  e>0,  there  exists  m  (e)  in  IN  such  that  when¬ 
ever  m  >  m  (e)  in  IN, 


c(X(i))  -  cB  (X  [i )) 


:  =1 


<  € 


(7.49) 


for  every  B  >0. 

To  conclude  the  argument,  observe  that  for  every  n  =1,2,...  and  every  B  >0, 


E* 

-E  c(X(i))  -  IE *C{X) 

<  Ea 

-E  c(X(i))  -  cB  (X  (i )) 

n  1=1 

ni=  1 

+ 


Ea  -  E  cb(X(1))  -  cB  (X ) 
«i  =  1 

IE5  c  (X)  -  IEff  cB(X) 


(7.50) 


Now  fix  e>0  and  pose  n  (e)  :=  m  (e)y  n  (e,B  ().  It  is  now  clear  from  (7.46)-(7.50)  that  when¬ 
ever  n  >n  (e)  in  IN, 


Ea 


-  E  c  (-V  (* ))  -  c  {X ) 
ni  =  1 


<  3e 


and  this  completes  the  proof  since  e>0  is  arbitrary. 


(7.51) 

□ 
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APPENDIX: 


Lemma  A. 1  Under  the  condition  p<  1,  the  inequality 

r  ( b  ):=a  (z  ( b  ))b  >1 

holds  for  all  b  >1. 

Proof:  With  the  mapping  +  oo)XIRA  +1— 1 >IR  being  defined  by 

Mi 


<f>(6  ,a  )  :=  log6 


K 

E  ak  loS 

k  =0 


b  _1  +tlk 


for  all  pair  (6  ,a  )  in  [1,  +  oc) XIRA  +1,  it  is  plain  from  Jensen’s  inequality  that 

r  (6  )  —  £*[e*(4  A(n))j  > 

for  all  7t  in  n  and  all  n  =1.2,...  Simple  calculations  readily  show  that 


d 


K 


Mb  ,a  )  =  i[l  -  S  ak  9k  (b  )] 
db  b  k=  o 


where  the  mappings  gk  :[1.  -  oo)— >IR  are  defined  by 

b 


9k(b  )  :== 


b  -1+Mi 


,  0  <k  <K. 


(A.l) 


A. 2) 


■  (A. 3) 


(A.4) 


(A.  5) 


Each  one  of  these  mappings  is  monotone  decreasing  on  the  interval  [l,  +  oo)  owing  to  the 
fact  that 


d  .  ,  .  (1  Mi  )  ^  ^  rs 

db  (6-  l+!ik) 

and  consequently  for  all  b  >1,  gk  (b  )  <  gk  (l)  =  .  Therefore, 

Mi 


(A.  6) 


d  -<f>(6  .X)  ==  4-[l  -  E^kdk(b)] 

0  t=  o 


db 


>  |[i  -  E  *k9k( i)i  = 


i -p 


>  0 


(A.7) 


k  =0 


and  the  mapping  b— »<f>(6  ,X)  is  thus  monotone  increasing  with  the  property  that 
0  —  <h(l,X)  <  4>(6  ,X)  on  the  interval  il,  -f  oo).  The  conclusion  (A.l)  now  immediately  fol¬ 
lows  from  the  inequality  (A. 3).  □ 


Proof  of  Theorem  4.1.:  Fix  6  in  the  interval  [1.  +  oo).  For  each  n  =1,2 .  the  Fn- 

stopping  times  cr  A  n  and  r  A  n  being  bounded,  Doob's  Optional  Sampling  Theorem  ;lo;  , 
applied  to  the  [P7' ,Fn  )-martingale  {M  (n  ,b  )}1<x>.  gives 
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E*[M  (r  A  n,b)\Faj\n]  =  M  (a  A  n  ,b).  n=l,2,.,.(A.8) 

The  event  {n  <a\  is  F n  -measurable  and  on  it,  the  identity 

M (a A  n  ,b)  =  M(tA  n  ,b)  —  M(n  ,b  )  n=l,2,...(A.9) 

holds.  The  relation 

En[I[n  «j]M {tA  n  ,b  )  j  F  n  ]  =  I  \n  <a]M(a  A  n,b)  n=l,2,...(A.10) 

therefore  follows,  owing  to  the  Fa^  n  -measurability  of  the  RV  M  (a  '  n,b),  and  the  rela¬ 
tion  (A. 8)  thus  reduces  to 

E*[I  [a<n]M(TA  n  ,b  )  \  F  „  ]  =  I  [a<n}M  (a  A  n  ,b  ).  n=l,2 . (A. 11) 


To  proceed  further,  note  that  the  RV’s  {M (t A  n,b)}f°  can  be  rewritten  in  the  fac¬ 
tored  form 

l  L  (a  A  n)  /,  L  (rA  n)~L(°^  »)  K  V  f-A  1 

M(t A  n  ,b  )  =  - - -  X  - - 7- - r—  n  zk{b  )t(  }  (A.  12) 

r  (6  )<tA  "  r  ( 6  )rA  "  -  »  k  =o  k  y  ’ 

for  all  n  =1,2,...  Substitution  of  this  last  expression  into  (A. 11)  thus  leads  to  the  equality 


E * 


bL(r A  nVHaA  »)  K  Y  (rA  n) 

I  [a<n  - t - 7 - II  zl  (b  ) 

-  -  k=o 


F  „  ■  „ 

<7  .  n 


(A.13) 


=  I  [a<n  ]  n  zk  (6  )A*(ctA  n),  n=l,2.... 

k  =0 

after  some  easy  simplifications  that  exploit  the  fact  that  the  first  factor  on  the  right  handside 
of  (A. 12)  is  Ffh  n  -measurable. 

Direct  inspection  shows  that  on  the  event  [c<7i  <r],  the  relations 

r A  n  -  a A  n—n-aA  n  and  L{tA  n)-L{oA  n)—I[X(cr A  n  =0)],  (A. 14) 


hold  for  all  n  =1,2,....  whence 


0  <EZ 


I  [a  <n 


<7 


b  L  (t  A  n  )-L  (a  A  n  ) 
r  (6  )7 A  ° ^  n 


K 

n  Zk(b) 


Xt(r  A 


n 


F, 


n 


< 


lI\X{<tA  n  )=o: 

— -  £^7[cr<tt  <r]r  (6  ) 

r(M"  ‘  “ 

6r  (6  )<jA  " 


.1-  ( k  \&  A  n  | 


a  A  n  \ 


r(b)n 


■I  ia<n  1  < 


W 

r(6)n 


(A.  15) 

n=l, 2. ...(A. 16) 


where  the  passage  from  (A. 15)  to  (A. 16)  made  use  of  the  fact  that  the  RV’s  a  A  n  and 
/[cr<n'i  are  both  F a:-  „ -measurable.  By  Lemma  A.l,  r  (6  )>  1  whenever  b  >1  under  the 
assumed  condition  p<  1.  and  the  bounds  (A. 16)  immediately  imply  that  on  :.<r<ocA 
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lim  En 


K 


bHrA  -HM  n)  ~  ^  (r  A  „) 

I\a<n<T\ - - - - -  n  zk(b) 

r  (6  )rA  "  -<rA  "  *  =o 


<7  A  n 


=  0  P*-a.s. 


(A. 17) 


On  the  other  hand,  on  the  event  [r<n  ], 

r  A  n  -crA  n  =r  -  <t=i/  ,  Z,  (rA  n)-L{crf\  n)=I[X(a  A  ?!  )=0],  (A. 18a) 


while 


X  (rA  n  )=0, 


n =1,2,  ...(A.  18b) 


and  therefore 


lL(t/  n\-L(o  An)  AT  v  „  , 

Er\rr<n  ±  r,--  n _zk(b)XkiTA  n)|  F,a  n 


r(4)f 


it  =0 


£*[/  ir<n  ] — 1—  LF,a  J  6/;A'(<rA  "  )=0) 
r(6)1'  A 


(A. 19) 


=  /  [<t<  n  }E7[I  \t<  n  — —  I  Fa]  bI[X{<x)=0] 
~  '  r  (6  f  ' 


(A. 20) 


for  all  ?? =1,2,...  The  passage  from  (A. 19)  to  (A. 20)  is  readily  justified  by  well  known  proper¬ 
ties  of  conditional  expectations  (  (15)  ,  Thm.  pp.  )  based  on  the  fact  that  the  traces  of  the  a- 
fields  F n  and  F  c  coincide  on  the  event  [a<?!  ].  It  now  follows  from  the  Monotone  Con¬ 
vergence  Theorem  for  conditional  expectations  that 


lim  E*[I[r<n 

n  — ►oo 


^  L  (rA  n  }-L  (a  A  n  ) 

1  r(6  Y '  n  -*A  n 


K 

n  zjc(b) 

k  =0 


-Xi(r  A 


1 1  f„a 


=  /[<7<oo]E*'J[r<cx>]— —  |  Fa)  &/1A>)=o1  P7-a.s. 

r(bY 


Upon  combining  (A. 17)  and  (A. 21),  the  reader  will  now  check  that 


lim  E 

n  — >oo 


/  {<7  <  1 


b  L(r  \  n  )-L  (a  A  n) 


r(b) 


t  A  n  -<rA  n 


n  z*  (M 

t  =0 


Xk{n\ 


n) 


Fa  A 


n 


(A. 21) 


=  /:a<oo]^-/fr<co]— —  |  F a\  6/|X(<r)=o1  P'-a.s. 

r  (6  )" 


(A. 22) 


whereas 

A'  K 

lim  Ia<n]  n  zk  ( b  )Xk A  n'  =  /[cr<oo]  II  £*(6)  t(  (A-23) 

rc  — ►  ;c  £  =0  k  =0 

The  conclusion  (4.18)  is  now  obtained  by  letting  n  go  to  infinity  in  (A  i fi)  and  by  using  the 
limits  (A.22)-(A.23).  □ 
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